DOIBoost » History » Version 2
Alessia Bardi, 09/11/2021 11:58 AM
| 1 | 1 | Alessia Bardi | h1. DOIBoost |
|---|---|---|---|
| 2 | |||
| 3 | h4. DOIBoost: Crossref, Unpaywall, Microsoft Academic Graph, ORCID |
||
| 4 | |||
| 5 | The idea behind DOIBoost and its origin can be found in the paper (and related resources) at: |
||
| 6 | |||
| 7 | * La Bruzzo S., Manghi P., Mannocci A. (2019) OpenAIRE's DOIBoost - Boosting CrossRef for Research. In: Manghi P., Candela L., Silvello G. (eds) Digital Libraries: Supporting Open Science. IRCDL 2019. Communications in Computer and Information Science, vol 988. Springer, doi:10.1007/978-3-030-11226-4_11 . Open Access version available at: https://doi.org/10.5281/zenodo.1441071 |
||
| 8 | |||
| 9 | In short, the goal is to enrich the records available on Crossref with what's available on Unpaywall, Microsoft Academic Graph, ORCID intersecting all those datasets by DOI. |
||
| 10 | The generation of DOIBoost consists in the following phases: |
||
| 11 | |||
| 12 | 1 Filter Crossref records that: |
||
| 13 | * have blank title |
||
| 14 | * have one of the following publishers: "Test accounts", "CrossRef Test Account" |
||
| 15 | * have no authors with valid names, where valid means: not blank and different from all strings in this list: @List(",", "none none", "none, none", "none &na;", "(:null)", "test test test", "test test", "test", "&na; &na;")@ |
||
| 16 | * have "Addie Jackson" as author and "Elsevier BV" as publisher (empirically we say they are test records) |
||
| 17 | |||
| 18 | 2 | Alessia Bardi | 1.1 Map Crossref links to projects/funders |
| 19 | Links to funding available in Crossref are mapped as funding relationships (@result@ --> @project@) applying the following mapping: |
||
| 20 | |||
| 21 | | funder | funding project | notes | |
||
| 22 | | funder | funding project | notes | |
||
| 23 | | funder | funding project | notes | |
||
| 24 | | funder | funding project | notes | |
||
| 25 | | funder | funding project | notes | |
||
| 26 | |||
| 27 | * If the funder DOI is one of 10.13039/100010663, 10.13039/100010661, 10.13039/501100007601, 10.13039/501100000780, 10.13039/100010665, for each @award@ (identified as a series from 4 to 9 digits) a link to the corresponding *H2020 project* is created |
||
| 28 | * If the funder DOI is one of 10.13039/100011199, 10.13039/100004431, 10.13039/501100004963, 10.13039/501100000780 for each @award@ (identified as a series from 4 to 9 digits) a link to the corresponding *FP7 project* is created |
||
| 29 | * If the funder DOI is 10.13039/501100000781 for each @award@ (identified as a series from 4 to 9 digits) a link to the corresponding *FP7 or H2020 project* is created |
||
| 30 | * 10.13039/100000001 |
||
| 31 | |||
| 32 | |||
| 33 | 1 | Alessia Bardi | 2 Intersect Crossref with Unpaywall by DOI (DOIBoost1). The records are enriched with |
| 34 | * TODO: AUTHORS? |
||
| 35 | * one @instance@ with |
||
| 36 | ** the @best_oa_location@ of Unpaywall |
||
| 37 | ** @color@ set as follows: @green@ if the host is a repository; @gold@ if the host is publisher and the journal is open access; @hybrid@ if the host is publisher, the journal is not open access but there is a license; @bronze@ if no license is available. |
||
| 38 | |||
| 39 | 3 Intersect DOIBoost1 with ORCID (DOIBoost2). The records are enriched with the ORCID identifiers of their authors |
||
| 40 | |||
| 41 | 4 Intersect DOIBoost2 with Microsoft Academic Graph (DOIBoost3). The records are enriched with: |
||
| 42 | * abstracts |
||
| 43 | * MAG identifiers of authors |
||
| 44 | * affiliation relationships |
||
| 45 | * subjects (MAG FieldsOfStudy) |
||
| 46 | * conference or journal information (in the @journal@ field) TODO: or @container@, in case of the dump? |
||
| 47 | * [TO BE REMOVED] instances with URL from MAG |
||
| 48 | |||
| 49 | 5 Enrich DOIBoost3 with hosting data sources (@hostedby@) and access right information. In this phase we intersect DOIBoost3 with a dataset composed of journals from OpenAIRE, Crossref, and the ISSN gold list. Each journal comes with its International Standard Serial Numbers (issn, eissn, lissn) and, when available, a flag that tells if the journal is open access. The intersection is done on the basis of the International Standard Serial Numbers. The records with a @journal.[l|e]issn@ that match are enriched as follows: |
||
| 50 | * Each instance gain the `hostedby` information. |
||
| 51 | * If the journal is open access, the access rights of the instances are also set to "Open Access" with "gold" route. |
||
| 52 | |||
| 53 | The hostedby of records that do not match are set to the "Unknown Repository". |