DOIBoost » History » Version 7
Alessia Bardi, 10/11/2021 02:10 PM
More input info for DOIBoost
1 | 1 | Alessia Bardi | h1. DOIBoost |
---|---|---|---|
2 | |||
3 | 4 | Alessia Bardi | h2. DOIBoost: Crossref, Unpaywall, Microsoft Academic Graph, ORCID |
4 | 1 | Alessia Bardi | |
5 | The idea behind DOIBoost and its origin can be found in the paper (and related resources) at: |
||
6 | |||
7 | * La Bruzzo S., Manghi P., Mannocci A. (2019) OpenAIRE's DOIBoost - Boosting CrossRef for Research. In: Manghi P., Candela L., Silvello G. (eds) Digital Libraries: Supporting Open Science. IRCDL 2019. Communications in Computer and Information Science, vol 988. Springer, doi:10.1007/978-3-030-11226-4_11 . Open Access version available at: https://doi.org/10.5281/zenodo.1441071 |
||
8 | |||
9 | In short, the goal is to enrich the records available on Crossref with what's available on Unpaywall, Microsoft Academic Graph, ORCID intersecting all those datasets by DOI. |
||
10 | 5 | Alessia Bardi | |
11 | h3. Inputs |
||
12 | |||
13 | 6 | Alessia Bardi | * *Crossref*: dump available to Crossref subscribers via MetadataPlus service, updated once a month. |
14 | 5 | Alessia Bardi | * *Micorsoft Academic Graph*: downloaded version on 2021-02-15. We plan to take a latest version on Dec 2021 before MAG will be retired. |
15 | 7 | Alessia Bardi | * *ORCID*: baseline dump obtained in XX/XX/XXXX from URL, regularly updated every week from the ORCID API available at URL |
16 | * *Unpaywall*: public database snapshot downloaded in March 2021. Unpaywall updates it twice a year (https://unpaywall.org/products/snapshot) |
||
17 | 5 | Alessia Bardi | |
18 | 1 | Alessia Bardi | The generation of DOIBoost consists in the following phases: |
19 | |||
20 | 4 | Alessia Bardi | h3. 1 Filter Crossref records that |
21 | |||
22 | 1 | Alessia Bardi | * have blank title |
23 | * have one of the following publishers: "Test accounts", "CrossRef Test Account" |
||
24 | * have no authors with valid names, where valid means: not blank and different from all strings in this list: @List(",", "none none", "none, none", "none &na;", "(:null)", "test test test", "test test", "test", "&na; &na;")@ |
||
25 | * have "Addie Jackson" as author and "Elsevier BV" as publisher (empirically we say they are test records) |
||
26 | |||
27 | 4 | Alessia Bardi | h3. 2 Map Crossref links to projects/funders |
28 | 2 | Alessia Bardi | |
29 | 4 | Alessia Bardi | Links to funding available in Crossref are mapped as funding relationships (@result@ -- @isProducedBy@ --> @project@) applying the following mapping: |
30 | |||
31 | 3 | Alessia Bardi | | *funder* | *grant code* | *Link to* | |
32 | | DOI: {10.13039/100010663, 10.13039/100010661, 10.13039/501100007601, 10.13039/501100000780, 10.13039/100010665} |
||
33 | or name: 'European Union’s Horizon 2020 research and innovation program' | series of 4-9 digits in @award@ | Link to H2020 project | |
||
34 | | DOI: {10.13039/100011199, 10.13039/100004431, 10.13039/501100004963, 10.13039/501100000780} | series of 4-9 digits in @award@ | Link to FP7 project | |
||
35 | | DOI: 10.13039/501100000781 OR name: 'European Union's'| series of 4-9 digits in @award@ | Link to FP7 or H2020 project | |
||
36 | | DOI: 10.13039/100000001 | @award@ | Link to NSF project | |
||
37 | | DOI: 10.13039/501100001665 OR name: {'The French National Research Agency (ANR)', 'The French National Research Agency'} | @award@ | Link to ANR project | |
||
38 | | DOI: 10.13039/501100002341 | @award@ | Link to Academy of Finland project | |
||
39 | | DOI: 10.13039/501100001602 | @award@, removing the initial 'SFI' if present | Link to SFI project | |
||
40 | | DOI: 10.13039/501100000923 | @award@ | Link to ARC project | |
||
41 | | DOI: 10.13039/501100000038 | @award@ ignore: we cannot map the project codes in Crossref to project codes in OpenAIRE | Link to NSERC (@unidentified@ project) | |
||
42 | | DOI: 10.13039/501100000155 | @award@ ignore: we cannot map the project codes in Crossref to project codes in OpenAIRE | Link to SSHRC (@unidentified@ project) | |
||
43 | | DOI: 10.13039/501100000024 | @award@ ignore: we cannot map the project codes in Crossref to project codes in OpenAIRE | Link to CIHR (@unidentified@ project) | |
||
44 | | DOI: 10.13039/501100002848 OR name :'CONICYT, Programa de Formación de Capital Humano Avanzado' | @award@ | Link to CONICYT project | |
||
45 | | DOI: 10.13039/501100003448 | series of 4-9 digits in @award@ | Link to GSRT project | |
||
46 | | DOI: 10.13039/501100010198 | @award@ | Link to SGOV project | |
||
47 | | DOI: 10.13039/501100004564 | series of 4-9 digits in @award@ | Link to MESTD project | |
||
48 | | DOI: 10.13039/501100003407 | @award@ | Link to MIUR project. Since OpenAIRE has a small subset of MIUR projects, a link to the MIUR funder (@unidentified@ project) is also generated | |
||
49 | | DOI: {10.13039/501100006588, 10.13039/501100004488} | @award@, removing 'Project No' and 'HRZZ' prefix, if present | Link to HRZZ or MZOS project | |
||
50 | | DOI: 10.13039/501100006769 | @award@ | Link to Russian Science Foundation project | |
||
51 | | DOI: 10.13039/501100001711 | @award@ after '_' and before '/' | Link to SNSF project | |
||
52 | | DOI: 10.13039/501100004410 | @award@ | Link to TUBITAK project | |
||
53 | 1 | Alessia Bardi | | DOI: 10.10.13039/100004440 or name: 'Wellcome Trust Masters Fellowship'| @award@ | Link to Wellcome Trust specific project and to the @unidentified@ project.| |
54 | |||
55 | |||
56 | |||
57 | 3 | Alessia Bardi | |
58 | 4 | Alessia Bardi | h3. 3 Intersect Crossref with Unpaywall by DOI (DOIBoost1) |
59 | |||
60 | The records are enriched with |
||
61 | |||
62 | 1 | Alessia Bardi | * TODO: AUTHORS? |
63 | * one @instance@ with |
||
64 | ** the @best_oa_location@ of Unpaywall |
||
65 | ** @color@ set as follows: @green@ if the host is a repository; @gold@ if the host is publisher and the journal is open access; @hybrid@ if the host is publisher, the journal is not open access but there is a license; @bronze@ if no license is available. |
||
66 | |||
67 | 4 | Alessia Bardi | h3. 4 Intersect DOIBoost1 with ORCID (DOIBoost2) |
68 | 1 | Alessia Bardi | |
69 | 4 | Alessia Bardi | The records are enriched with the ORCID identifiers of their authors |
70 | |||
71 | h3. 5 Intersect DOIBoost2 with Microsoft Academic Graph (DOIBoost3) |
||
72 | |||
73 | The records are enriched with: |
||
74 | 1 | Alessia Bardi | * abstracts |
75 | * MAG identifiers of authors |
||
76 | * affiliation relationships |
||
77 | * subjects (MAG FieldsOfStudy) |
||
78 | * conference or journal information (in the @journal@ field) TODO: or @container@, in case of the dump? |
||
79 | * [TO BE REMOVED] instances with URL from MAG |
||
80 | |||
81 | 4 | Alessia Bardi | h3. 6 Enrich DOIBoost3 with hosting data sources (@hostedby@) and access right information |
82 | |||
83 | In this phase we intersect DOIBoost3 with a dataset composed of journals from OpenAIRE, Crossref, and the ISSN gold list. Each journal comes with its International Standard Serial Numbers (issn, eissn, lissn) and, when available, a flag that tells if the journal is open access. The intersection is done on the basis of the International Standard Serial Numbers. The records with a @journal.[l|e]issn@ that match are enriched as follows: |
||
84 | * Each instance gain the @hostedby@ information. |
||
85 | 1 | Alessia Bardi | * If the journal is open access, the access rights of the instances are also set to "Open Access" with "gold" route. |
86 | |||
87 | The hostedby of records that do not match are set to the "Unknown Repository". |