Project

General

Profile

Core Data Model » History » Version 19

Alessia Bardi, 28/04/2015 02:12 PM

1 1 Paolo Manghi
h1. OpenAIRE core data model
2
3 3 Paolo Manghi
h2. General concepts
4
5 6 Claudio Atzori
The main entities of the OpenAIRE information space are: *datasets*, *publications*, *persons*, *organisations*, *funders*, *funding streams*, *projects*, and *data sources*.
6 1 Paolo Manghi
7
In our reasoning we generalize the concept of *datasets* and *publications* to that of project *result*, so as to be able of including further kinds of research outputs. OpenAIRE initially proposes two kinds of results: datasets (e.g., experimental data, software products) and publications. But others can be added in the future (e.g., patents). Besides, project results are always associated to one or more instances of the results, in the sense that different “physical representations” of the same result may exist. For example, the same publication may be kept in two different repositories, both exposing the payload file (e.g., PDF) at different internet locations (URLs). Morover, an instance of a result is represented as a combination of one or more web resources relative to the sub-parts of the result and of the internet data sources from which such resources are made available. 
8
9
Similarly, we extend the notion of authors of publications or datasets to that of *persons*, to include in the same set people connected to project fundings or organizations. For example “authorship” relationships between results and persons, which represent the fact that a given person has (co-)authored a given result while being affiliated with a given organization.
10
11
*Organizations* include companies, research centers or institutions involved as project partners or as responsible of operating data sources. Information about organizations will be initially collected from CORDA and CRIS systems, as being related to projects, or be ingested by users, for example to complete authorships information in the database. 
12
13
Of crucial interest to OpenAIRE is also the identification of the *funders* (e.g. European Commission, WellcomeTrust, FCT Portugal, NWO The Netherlands) which co-funded the *projects* that have led to a given result. Funders can be associated to a list of *funding streams* (e.g. FP7 for the EC), which identify the strands of fundings comprised by the funding stream. Funding streams can be nested to for a tree of subfunding streams, and projects are typically associated to the fudnding stream “leaves” of such trees.
14
15 9 Paolo Manghi
Finally, OpenAIRE entity instances are created out of data collected from various *data sources* of different kinds, such as publication repositories, dataset archives, CRIS systems, etc. Data sources export information packages (e.g., XML records, HTTP responses, RDF data, JSON) that may contain information on one or more of such entities and possibly relationships between them. It is important, once each piece of information is extracted from such packages and inserted into the information space as an entity, for such pieces to keep provenance information relative to the originating data source. This is to give visibility to the data source, but also to enable the reconstruction of very the same piece of information if problems arise.
16 1 Paolo Manghi
17
h2. OpenAIRE relationships and the CERIF semantic layer
18
19 10 Paolo Manghi
For more check *CERIF's web site*: http://www.eurocris.org
20 1 Paolo Manghi
21 10 Paolo Manghi
According to the CERIF's data model vision: (i) “horizontal” classification of entities (e.g., by vocabularies of terms) is not modeled through properties associated to given controlled vocabularies and (ii) semantic relationships between entities are not modeled by adding dedicated relationships. In both cases, CERIF introduces a flexible modeling mechanism which allows injecting classification semantics into “semantics-agnostic” entities and relationships. The mechanism is obtained by introducing two entities Schemes and Classes such that:
22
23 1 Paolo Manghi
* *Class* A Class represents one term of a classification, e.g., vocabulary, taxonomy, under a given Scheme. As such it is characterized by the following properties: a Code, which represents the persistent identifier associated to the term (e.g., real-world classifications, such as ISO vocabularies for countries, have a standard identification code for terms), a name, an acronym, a description, a StartDate, and an EndDate. 
24
* *Scheme* A Scheme identifies the existence of a classification scheme, which is modeled as a set of Class objects. A Scheme is characterized by the following properties: a Code, which represents the persistent identifier associated to the Scheme (e.g., real-world schemes, such as taxonomies, may be have a standard identification code), a name, an acronym, a description, a StartDate, and an EndDate. 
25
26 12 Paolo Manghi
According to the CERIF's definition, Classes and Schemes can be themselves interlinked to form arbitrary complex lattices of Classes and Schemes, respectively. In OpenAIRE we adopt a lighter interpretation, by introducing the pair Scheme/Class whenever we need to introduce a property of type [[type_qualifier|Qualifier]], i.e. a property whose value comes from a controlled vocabulary, or a relationship between core entities in the model. Such mechanisms allow to flexibly inject relationship semantics and vocabularies into the data model.
27 1 Paolo Manghi
28
h2. OpenAIRE classes of entities
29
30 8 Paolo Manghi
The entities in the data model belong to the following meta-classes:
31 18 Claudio Atzori
* *Core entities*: the entities whose information is continuously and incrementally fed to the information space and is of interest to OpenAIRE end-users; namely *[[core_entity_result|Result]]* (Publication and Dataset), *[[core_entity_person|Person]]*, *[[core_entity_organization|Organization]]*, *[[core_entity_datasource|DataSource]]* (Repository, Dataset Archive, CRIS, Aggregator, Entity Registry), *[[core_entity_project|Projects]]*, *[[core_entity_funder|Funder]]*, *[[core_entity_fundingstream|Funding Stream]]*; 
32 2 Paolo Manghi
* *Linking entities*: entities used to model relationships, used to connect in a semantic-agnostic way two or more main entities; namely, those denoted by an Entity1_Entity2 notation (see aforementioned CERIF semantic layer).