Project

General

Profile

Actions

OpenAIRE entity identifier and PID mapping policy » History » Revision 1

Revision 1/3 | Next »
Alessia Bardi, 05/11/2021 03:54 PM


OpenAIRE entity identifier and PID mapping policy

(copied from https://docs.google.com/document/d/1PnvZpmhbanJu3AeOT-zdIyMKIHoGKC4_Z0UtDFDZAeM/edit#)

OpenAIRE assign internal identifiers for each object it collects.
By default, the internal identifier is generated as sourcePrefix::md5(localId) where
  • sourcePrefix is a namespace prefix of 12 chars assigned to the data source at registration time
  • localid is the identifier assigned to the object by the data source
After years of operation, we can say that:
  • localId are unstable
  • objects can disappear from sources
  • PIDs provided by sources that are not PID agencies (authoritative sources for a specific type of PID) are often wrong (e.g. pre-print with the DOI of the published version, DOIs with typos)
Therefore, when the record is collected from an authoritative source:
  • the identity of the record is forged using the PID, like pidTypePrefix::md5(lowercase(doi))
  • the PID is added in a pid element of the data model.
When the record is collected from a source which is not authoritative for any type of PID:
  • the identity of the record is forged as usual using the local identifier;
  • the PID, if available, is added as @alternateIdentifier@s

As of November 2021, the following data sources are used as "PID authorities":

PID Type Prefix (12 chars) Authority
doi doi_________ Crossref, Datacite, Zenodo
pmc pmc_________ Europe PubMed Central, PubMed Central
pmid pmid________ Europe PubMed Central, PubMed Central
arXiv arXiv_______ arXiv.org e-Print Archive
handle handle______ any repository

TODO: WHAT HAPPENS FOR RECORDS WITH BOTH pmc and pmid? pmc wins?

OpenAIRE also perform duplicate identification (see dedicated section for details).
All duplicates are "merged" together in a "representative record" which must be assigned to a dedicated OpenAIRE identifier (i.e. it cannot have the identifier of one of the aggregated record).
The following strategy is applied to generate the OpenAIRE identifier of a representative record, to ensure it is as stable as possible:

TODO

Updated by Alessia Bardi about 3 years ago · 1 revisions