Project

General

Profile

Actions

Wiki » History » Revision 1

Revision 1/2 | Next »
Alessia Bardi, 05/11/2021 11:44 AM
Documentation intro and TOC


The OpenAIRE Research Graph

The OpenAIRE Research Graph is one of the largest open scholarly record collections worldwide, key in fostering Open Science and establishing its practices in the daily research activities.
Conceived as a public and transparent good, populated out of data sources trusted by scientists, the Graph aims at bringing discovery, monitoring, and assessment of science back in the hands of the scientific community.

Imagine a vast collection of research products all linked together, contextualised and openly available. For the past ten years OpenAIRE has been working to gather this valuable record. It is a massive collection of metadata and links between scientific products such as articles, datasets, software, and other research products, entities like organisations, funders, funding streams, projects, communities, and data sources.

As of today, the OpenAIRE Research Graph aggregates around 450Mi metadata records with links collecting from 10K data sources trusted by scientists, including:
  • Repositories registered in OpenDOAR or re3data.org
  • Open Access journals registered in DOAJ
  • Crossref
  • Unpaywall
  • ORCID
  • Microsoft Academic Graph
  • Datacite

After cleaning, deduplication, enrichment and full-text mining processes, the graph is analysed to produce statistics for OpenAIRE MONITOR (https://monitor.openaire.eu), the Open Science Observatory (https://osobservatory.openaire.eu), made discoverable via OpenAIRE EXPLORE (https://explore.openaire.eu) and programmatically accessible as described at https://develop.openaire.eu.
Json dumps are also published on Zenodo

Graph Data Dumps

  • Drawing of the schema/data model
  • Tables with entities, relationships, data types, vocabularies, and semantics of properties
  • FAQ

Graph provision processes

  • OpenAIRE entity identifier & PID mapping policy
    • Aggregation business logic by major sources:
    • Unpaywall integration
    • Crossref integration
    • ORCID integration
    • Cross cleaning actions: hostedBy patch
    • Scholexplorer business logic (relationship resolution)
    • DataCite
    • EuropePMC
    • more….
  • Deduplication business logic
    • For research outputs
    • For research organizations
  • Enrichment
    • Mining business logic
    • Deduction-based inference
    • Propagation business logic
  • Post-cleaning business logic
  • FAQ

Updated by Alessia Bardi over 2 years ago · 1 revisions