OpenAIRE entity identifier and PID mapping policy » History » Version 1
Alessia Bardi, 05/11/2021 03:54 PM
| 1 | 1 | Alessia Bardi | h1. OpenAIRE entity identifier and PID mapping policy |
|---|---|---|---|
| 2 | |||
| 3 | (copied from https://docs.google.com/document/d/1PnvZpmhbanJu3AeOT-zdIyMKIHoGKC4_Z0UtDFDZAeM/edit#) |
||
| 4 | |||
| 5 | OpenAIRE assign internal identifiers for each object it collects. |
||
| 6 | By default, the internal identifier is generated as @sourcePrefix::md5(localId)@ where |
||
| 7 | * @sourcePrefix@ is a namespace prefix of 12 chars assigned to the data source at registration time |
||
| 8 | * @localid@ is the identifier assigned to the object by the data source |
||
| 9 | |||
| 10 | After years of operation, we can say that: |
||
| 11 | * @localId@ are unstable |
||
| 12 | * objects can disappear from sources |
||
| 13 | * PIDs provided by sources that are not PID agencies (authoritative sources for a specific type of PID) are often wrong (e.g. pre-print with the DOI of the published version, DOIs with typos) |
||
| 14 | |||
| 15 | Therefore, when the record is collected from an authoritative source: |
||
| 16 | * the identity of the record is forged using the PID, like @pidTypePrefix::md5(lowercase(doi))@ |
||
| 17 | * the PID is added in a @pid@ element of the data model. |
||
| 18 | |||
| 19 | When the record is collected from a source which is not authoritative for any type of PID: |
||
| 20 | * the identity of the record is forged as usual using the local identifier; |
||
| 21 | * the PID, if available, is added as @alternateIdentifier@s |
||
| 22 | |||
| 23 | As of November 2021, the following data sources are used as "PID authorities": |
||
| 24 | |||
| 25 | | PID Type | Prefix (12 chars) | Authority | |
||
| 26 | | doi | @doi_________@ | Crossref, Datacite, Zenodo | |
||
| 27 | | pmc | @pmc_________@ | Europe PubMed Central, PubMed Central | |
||
| 28 | | pmid | @pmid________@ | Europe PubMed Central, PubMed Central | |
||
| 29 | | arXiv | @arXiv_______@ | arXiv.org e-Print Archive | |
||
| 30 | | handle | @handle______@ | any repository | |
||
| 31 | |||
| 32 | TODO: WHAT HAPPENS FOR RECORDS WITH BOTH pmc and pmid? pmc wins? |
||
| 33 | |||
| 34 | OpenAIRE also perform duplicate identification (see dedicated section for details). |
||
| 35 | All duplicates are "merged" together in a "representative record" which must be assigned to a dedicated OpenAIRE identifier (i.e. it cannot have the identifier of one of the aggregated record). |
||
| 36 | The following strategy is applied to generate the OpenAIRE identifier of a representative record, to ensure it is as stable as possible: |
||
| 37 | |||
| 38 | TODO |