OpenAIRE entity identifier and PID mapping policy » History » Version 2
Claudio Atzori, 09/11/2021 03:06 PM
1 | 1 | Alessia Bardi | h1. OpenAIRE entity identifier and PID mapping policy |
---|---|---|---|
2 | |||
3 | (copied from https://docs.google.com/document/d/1PnvZpmhbanJu3AeOT-zdIyMKIHoGKC4_Z0UtDFDZAeM/edit#) |
||
4 | |||
5 | OpenAIRE assign internal identifiers for each object it collects. |
||
6 | By default, the internal identifier is generated as @sourcePrefix::md5(localId)@ where |
||
7 | * @sourcePrefix@ is a namespace prefix of 12 chars assigned to the data source at registration time |
||
8 | * @localid@ is the identifier assigned to the object by the data source |
||
9 | |||
10 | After years of operation, we can say that: |
||
11 | * @localId@ are unstable |
||
12 | * objects can disappear from sources |
||
13 | * PIDs provided by sources that are not PID agencies (authoritative sources for a specific type of PID) are often wrong (e.g. pre-print with the DOI of the published version, DOIs with typos) |
||
14 | |||
15 | Therefore, when the record is collected from an authoritative source: |
||
16 | * the identity of the record is forged using the PID, like @pidTypePrefix::md5(lowercase(doi))@ |
||
17 | * the PID is added in a @pid@ element of the data model. |
||
18 | |||
19 | When the record is collected from a source which is not authoritative for any type of PID: |
||
20 | * the identity of the record is forged as usual using the local identifier; |
||
21 | * the PID, if available, is added as @alternateIdentifier@s |
||
22 | |||
23 | As of November 2021, the following data sources are used as "PID authorities": |
||
24 | |||
25 | 2 | Claudio Atzori | |_. PID Type |_. Prefix (12 chars) |_. Authority | |
26 | 1 | Alessia Bardi | | doi | @doi_________@ | Crossref, Datacite, Zenodo | |
27 | | pmc | @pmc_________@ | Europe PubMed Central, PubMed Central | |
||
28 | | pmid | @pmid________@ | Europe PubMed Central, PubMed Central | |
||
29 | | arXiv | @arXiv_______@ | arXiv.org e-Print Archive | |
||
30 | | handle | @handle______@ | any repository | |
||
31 | |||
32 | TODO: WHAT HAPPENS FOR RECORDS WITH BOTH pmc and pmid? pmc wins? |
||
33 | |||
34 | OpenAIRE also perform duplicate identification (see dedicated section for details). |
||
35 | All duplicates are "merged" together in a "representative record" which must be assigned to a dedicated OpenAIRE identifier (i.e. it cannot have the identifier of one of the aggregated record). |
||
36 | The following strategy is applied to generate the OpenAIRE identifier of a representative record, to ensure it is as stable as possible: |
||
37 | |||
38 | TODO |