Actions

History

OpenAIRE Research Graph » History » Revision 33

« Previous | Revision 33/36 (diff) | Next »
Alessia Bardi, 10/11/2021 02:56 PM
Moving to dedicated section (Deduplication)

The OpenAIRE Research Graph¶

Table of contents
The OpenAIRE Research Graph

The OpenAIRE Research Graph is one of the largest open scholarly record collections worldwide, key in fostering Open Science and establishing its practices in the daily research activities.
Conceived as a public and transparent good, populated out of data sources trusted by scientists, the Graph aims at bringing discovery, monitoring, and assessment of science back in the hands of the scientific community.

Imagine a vast collection of research products all linked together, contextualised and openly available. For the past ten years OpenAIRE has been working to gather this valuable record. It is a massive collection of metadata and links between scientific products such as articles, datasets, software, and other research products, entities like organisations, funders, funding streams, projects, communities, and data sources.

As of today, the OpenAIRE Research Graph aggregates around 450Mi metadata records with links collecting from 10K data sources trusted by scientists, including:

Repositories registered in OpenDOAR or re3data.org
Open Access journals registered in DOAJ
Crossref
Unpaywall
ORCID
Microsoft Academic Graph
Datacite

After cleaning, deduplication, enrichment and full-text mining processes, the graph is analysed to produce statistics for OpenAIRE MONITOR (https://monitor.openaire.eu), the Open Science Observatory (https://osobservatory.openaire.eu), made discoverable via OpenAIRE EXPLORE (https://explore.openaire.eu) and programmatically accessible as described at https://develop.openaire.eu.
Json dumps are also published on Zenodo.

TODO: image of high-level data model (entities and semantic relationships, we can draw here: https://docs.google.com/drawings/d/1c4s7Pk2r9NgV_KXkmX6mwCKBIQ-yK3_m6xsxB-3Km1s/edit)

Graph Data Dumps¶

In order to facilitate users, different dumps are available. All are available under the Zenodo community called OpenAIRE Research Graph.
Here we provide detailed documentation about the full dump:

Json dump: https://doi.org/10.5281/zenodo.3516917
Json schema: https://doi.org/10.5281/zenodo.4238938

Json schema ¶

FAQ ¶

Graph provision processes¶

TODO:
3. Processes

harvesting NOT SURE WHAT IOANNA WANTS: is what we have on graph.openaire.eu OK?
transformation NOT SURE WHAT IOANNA WANTS: is what we have on graph.openaire.eu OK?
~~doiboost~~ (not in processing, in #Aggregation business logic by major sources)
direct zenodo updates
deduplication STARTED
inference
Funder ingestion ("Harry is speaking with funders, gets the list of projects, inference rules, ...")
For the processes we need a description of what it does, what's the input, which part of the graph it affects, and anything of importance ("if there is no input value for X, then we assume Y and assign the value Z to A")

OpenAIRE entity identifier and PID mapping policy

Aggregation business logic by major sources¶

TODO:
2. input sources

repositories
journals
~~DOIBoost~~
* ~~MAG~~
* ~~Crossref~~
* ~~Unpaywall~~
projects
organizations
openorgs
open citations
openAPC
etc etc

For each input source class, we need to know what's in there, the format, approximate numbers, peculiarities, important aspects of the data (e.g. "Crossref provides us with the authoritative list of publishers!"). Anything else of importance?

DOIBoost is the intersection among Crossref, Unpaywall, Microsoft Academic Graph and ORCID

Deduplication

TODOs¶

OpenAIRE entity identifier & PID mapping policy (started, to be completed by Claudio and/or Michele DB)
Aggregation business logic by major sources:
- ~~Unpaywall integration~~
- ~~Crossref integration~~
- ~~ORCID integration~~
- ~~Cross cleaning actions: hostedBy patch~~
- Scholexplorer business logic (relationship resolution)
- DataCite
- EuropePMC
- more….
Deduplication business logic (started, to be completed by Michele DB)
- For research outputs ( ~~publications~~ , datasets, software, orp)
- For research organizations
Enrichment
- Mining business logic
- Deduction-based inference
- Propagation business logic
Post-cleaning business logic
FAQ

Files (1)

Updated by Alessia Bardi over 4 years ago · 33 revisions

Project

General

Profile

Documentation

Wiki

OpenAIRE Research Graph » History » Revision 33

The OpenAIRE Research Graph¶

Graph Data Dumps¶

Json schema ¶

FAQ ¶

Graph provision processes¶

Aggregation business logic by major sources¶

DOIBoost ¶

Datacite ¶

EuropePMC ¶

TODOs¶

Project

General

Profile

Documentation

Wiki

OpenAIRE Research Graph » History » Revision 33

The OpenAIRE Research Graph¶

Graph Data Dumps¶

Json schema¶

FAQ¶

Graph provision processes¶

Aggregation business logic by major sources¶

DOIBoost¶

Datacite¶

EuropePMC¶

TODOs¶

Json schema ¶

FAQ ¶

DOIBoost ¶

Datacite ¶

EuropePMC ¶