Project

General

Profile

DOIBoost » History » Version 2

Alessia Bardi, 09/11/2021 11:58 AM

1 1 Alessia Bardi
h1. DOIBoost
2
3
h4. DOIBoost: Crossref, Unpaywall, Microsoft Academic Graph, ORCID
4
5
The idea behind DOIBoost and its origin can be found in the paper (and related resources) at: 
6
7
* La Bruzzo S., Manghi P., Mannocci A. (2019) OpenAIRE's DOIBoost - Boosting CrossRef for Research. In: Manghi P., Candela L., Silvello G. (eds) Digital Libraries: Supporting Open Science. IRCDL 2019. Communications in Computer and Information Science, vol 988. Springer, doi:10.1007/978-3-030-11226-4_11 . Open Access version available at: https://doi.org/10.5281/zenodo.1441071
8
9
In short, the goal is to enrich the records available on Crossref with what's available on Unpaywall, Microsoft Academic Graph, ORCID intersecting all those datasets by DOI.
10
The generation of DOIBoost consists in the following phases:
11
12
1 Filter Crossref records that:
13
* have blank title
14
* have one of the following publishers: "Test accounts", "CrossRef Test Account"
15
* have no authors with valid names, where valid means: not blank and different from all strings in this list: @List(",", "none none", "none, none", "none &na;", "(:null)", "test test test", "test test", "test", "&na; &na;")@
16
* have "Addie Jackson" as author and "Elsevier BV" as publisher (empirically we say they are test records)
17
18 2 Alessia Bardi
1.1 Map Crossref links to projects/funders
19
Links to funding available in Crossref are mapped as funding relationships (@result@ --> @project@) applying the following mapping:
20
21
| funder | funding project | notes |
22
| funder | funding project | notes |
23
| funder | funding project | notes |
24
| funder | funding project | notes |
25
| funder | funding project | notes |
26
27
* If the funder DOI is one of 10.13039/100010663, 10.13039/100010661, 10.13039/501100007601, 10.13039/501100000780, 10.13039/100010665, for each @award@ (identified as a series from 4 to 9 digits) a link to the corresponding *H2020 project* is created
28
* If the funder DOI is one of 10.13039/100011199, 10.13039/100004431, 10.13039/501100004963, 10.13039/501100000780 for each @award@ (identified as a series from 4 to 9 digits) a link to the corresponding *FP7 project* is created
29
* If the funder DOI is 10.13039/501100000781 for each @award@ (identified as a series from 4 to 9 digits) a link to the corresponding *FP7 or H2020 project* is created
30
* 10.13039/100000001
31
32
33 1 Alessia Bardi
2 Intersect Crossref with Unpaywall by DOI (DOIBoost1). The records are enriched with 
34
* TODO: AUTHORS?
35
* one @instance@ with 
36
** the @best_oa_location@ of Unpaywall
37
** @color@ set as follows: @green@ if the host is a repository; @gold@ if the host is publisher and the journal is open access; @hybrid@ if the host is publisher, the journal is not open access but there is a license; @bronze@ if no license is available.
38
39
3 Intersect DOIBoost1 with ORCID (DOIBoost2). The records are enriched with the ORCID identifiers of their authors
40
41
4 Intersect DOIBoost2 with Microsoft Academic Graph (DOIBoost3). The records are enriched with:
42
* abstracts
43
* MAG identifiers of authors
44
* affiliation relationships
45
* subjects (MAG FieldsOfStudy)
46
* conference or journal information (in the @journal@ field) TODO: or @container@, in case of the dump?
47
* [TO BE REMOVED] instances with URL from MAG
48
49
5 Enrich DOIBoost3 with hosting data sources (@hostedby@) and access right information. In this phase we intersect DOIBoost3 with a dataset composed of journals from OpenAIRE, Crossref, and the ISSN gold list. Each journal comes with its International Standard Serial Numbers (issn, eissn, lissn) and, when available, a flag that tells if the journal is open access. The intersection is done on the basis of the International Standard Serial Numbers. The records with a @journal.[l|e]issn@ that match are enriched as follows:
50
* Each instance gain the `hostedby` information. 
51
* If the journal is open access, the access rights of the instances are also set to "Open Access" with "gold" route.
52
53
The hostedby of records that do not match are set to the "Unknown Repository".