Project

General

Profile

Subsystems » History » Version 3

Paolo Manghi, 29/04/2015 11:28 AM

1 1 Paolo Manghi
h1. OpenAIRE infrastructure sub-systems
2 2 Paolo Manghi
3
The OpenAIRE infrastructure features a number of sub-systems, dedicated to four main activities:
4
5 3 Paolo Manghi
* [[aggregation_subsystem|Aggregation sub-system]]: collection of [[information_package|information packages]] and publication texts (e.g. PDFs, XMLs, HTMLs) from data sources; based on the typology of such packages (e.g. Dublin Core metadata records, DataCite metadata records, CERIF-XML metadata records, proprietary formats), the system transforms them onto metadata records with uniform structure and semantics, matching the specification of the OpenAIRE data model;
6
* [[deduplication_subsystem| De-duplication sub-system]]: given as input the information space graph, the system identifies duplicates among the objects of the same entity type; for each entity, the system generates a set of similarity relationships between pairs of objects identified as duplicates, which can be used by the data publishing subsystem to generate a disambiguated information space;
7
* [[informationinference_subsystem|Information inference sub-system]]: given as input the last public information space graph (hence, disambiguated and enriched by inference in the last round) and the publications full-texts, the system applies a number of mining algorithms (i.e. "modules"); for each mining module the system produces a set (called ActionSet) of inferred concepts, which can be used by the data publishing sub-system to enrich the information space graph;
8
* [[dataprovision_subsystem|Data provision sub-system]]: given as input the metadata records as yielded by the aggregation sub-system, the similarity relationships as (last) yielded by the de-duplication sub-system, and the inference ActionSets as (last) yielded by the information inference sub-system, the data provision system: 1) populates an initial bare-aggregation information space graph, 2) enriches the graph with similarity relationships and runs a object merging algorithm to remove duplicates, 3) enriches the graph with inferred information, 4) instantiates the graph over three back-ends: full-text index, OAI-PMH publisher, PostgreSQL statistics database (a LOD back-end is being developed in OpenAIRE2020).