The OpenAIRE Research Graph
The OpenAIRE Research Graph is one of the largest open scholarly record collections worldwide, key in fostering Open Science and establishing its practices in the daily research activities. Conceived as a public and transparent good, populated out of data sources trusted by scientists, the Graph aims at bringing discovery, monitoring, and assessment of science back in the hands of the scientific community.
Imagine a vast collection of research products all linked together, contextualised and openly available. For the past ten years OpenAIRE has been working to gather this valuable record. It is a massive collection of metadata and links between scientific products such as articles, datasets, software, and other research products, entities like organisations, funders, funding streams, projects, communities, and data sources.
As of today, the OpenAIRE Research Graph aggregates around 450Mi metadata records with links collecting from 10,000 data sources trusted by scientists, including repositories registered in OpenDOAR, Open Access journals registered in DOAJ, Crossref, Unpaywall, ORCID and Microsoft Academic Graph. After cleaning, deduplication, and fine-grained classification processes, they narrow down to ~100Mi publications, ~8Mi datasets, ~200K software research products, 8Mi other products linked together with semantic relations. More than 10Mi full-texts of Open Access publications are mined by algorithms to enrich metadata records with additional properties and links among research products, funders, projects, communities, and organizations. Thanks to the mining algorithm, the graph is completed with 480Mi semantic relations.
Detailed information can be found on https://graph.openaire.eu
Get the dumps
In order to facilitate users, different dumps are available. All are available under the Zenodo community called OpenAIRE Research Graph.
- The whole OpenAIRE Research Graph Dump
Dataset:
Schema:
This dataset is licensed under a Creative Commons Attribution 4.0 International License.
It is composed of several files so that you can download the parts you are interested into. Each file is a tar archive containing gz files, each with one json per line. - The OpenAIRE COVID-19 dump
Dataset:
Schema:
This dataset is licensed under a Creative Commons Attribution 4.0 International License.
It contains metadata records of publications, research data, software and projects on the topic of Corona Virus and COVID-19. This dump is part of the activities of OpenAIRE to support the fight against COVID-19 together with the OpenAIRE COVID-19 Gateway. The dump consists of a tar archive containing gzip files with one json per line. -
The dumps about research communities, initiatives and infrastructures
Dataset:
Schema:
This dataset is licensed under a Creative Commons Attribution 4.0 International License.
The dataset contains one file per community/initiative/infrastructure collaborating with OpenAIRE. Check out also their community gateways on CONNECT. Each file is a tar archive containing gzip files with one json per line. - The dump of ScholeXplorer
Dataset:
Schema (Scholix version 3):
This dataset is licensed under a CC0 1.0 Universal (CC0 1.0) Public Domain Dedication.
The dataset contains the GZ-compressed dump of the Scholix links exposed by the OpenAIRE ScholeXplorer service. - The dump of DOIBoost
Dataset:
Publication:
Software:
This dataset is licensed under a Creative Commons Attribution 4.0 International License.
DOIBoost is a metadata collection that enriches CrossRef with inputs from Microsoft Academic Graph, ORCID, and Unpaywall.
Cite us
If you use any of the dumps above for research purposes, please cite it following the reccomendation that you find on the Zenodo page.
The OpenAIRE Research Graph and DOIBoost include data from Microsoft Academic Graph (MAG): please acknowledge also MAG following this guideline.