Project

General

Profile

Core Data Model » History » Version 32

Alessia Bardi, 21/07/2015 03:54 PM
highithing the fact that OpenAIRE adopts a lighter interpretation of class/scheme than CERIF

1 1 Paolo Manghi
h1. OpenAIRE core data model
2
3 3 Paolo Manghi
h2. General concepts
4
5 6 Claudio Atzori
The main entities of the OpenAIRE information space are: *datasets*, *publications*, *persons*, *organisations*, *funders*, *funding streams*, *projects*, and *data sources*.
6 1 Paolo Manghi
7 31 Alessia Bardi
p=. !{width:65%}openaireCoreModel-2.png!
8 28 Alessia Bardi
_Figure 1 OpenAIRE Data Model: core entities and relationships._
9 27 Alessia Bardi
10 1 Paolo Manghi
In our reasoning we generalize the concept of *datasets* and *publications* to that of project *result*, so as to be able of including further kinds of research outputs. OpenAIRE initially proposes two kinds of results: datasets (e.g., experimental data, software products) and publications. But others can be added in the future (e.g., patents). Besides, project results are always associated to one or more instances of the results, in the sense that different “physical representations” of the same result may exist. For example, the same publication may be kept in two different repositories, both exposing the payload file (e.g., PDF) at different internet locations (URLs). Morover, an instance of a result is represented as a combination of one or more web resources relative to the sub-parts of the result and of the internet data sources from which such resources are made available. 
11
12 26 Alessia Bardi
Similarly, the notion of authors of publications or datasets is extended to that of *persons*, to include in the same set people connected to project fundings or organizations. For example “authorship” relationships between results and persons, which represent the fact that a given person has (co-)authored a given result while being affiliated with a given organization.
13 1 Paolo Manghi
14
*Organizations* include companies, research centers or institutions involved as project partners or as responsible of operating data sources. Information about organizations will be initially collected from CORDA and CRIS systems, as being related to projects, or be ingested by users, for example to complete authorships information in the database. 
15
16
Of crucial interest to OpenAIRE is also the identification of the *funders* (e.g. European Commission, WellcomeTrust, FCT Portugal, NWO The Netherlands) which co-funded the *projects* that have led to a given result. Funders can be associated to a list of *funding streams* (e.g. FP7 for the EC), which identify the strands of fundings comprised by the funding stream. Funding streams can be nested to for a tree of subfunding streams, and projects are typically associated to the fudnding stream “leaves” of such trees.
17
18 9 Paolo Manghi
Finally, OpenAIRE entity instances are created out of data collected from various *data sources* of different kinds, such as publication repositories, dataset archives, CRIS systems, etc. Data sources export information packages (e.g., XML records, HTTP responses, RDF data, JSON) that may contain information on one or more of such entities and possibly relationships between them. It is important, once each piece of information is extracted from such packages and inserted into the information space as an entity, for such pieces to keep provenance information relative to the originating data source. This is to give visibility to the data source, but also to enable the reconstruction of very the same piece of information if problems arise.
19 27 Alessia Bardi
20 29 Alessia Bardi
p=. !{width:50%}provenance.jpg!
21 28 Alessia Bardi
_Figure 2 OpenAIRE Data Model: core entities and provenance information._
22 1 Paolo Manghi
23 20 Alessia Bardi
h2. OpenAIRE and the CERIF semantic layer
24 1 Paolo Manghi
25 10 Paolo Manghi
For more check *CERIF's web site*: http://www.eurocris.org
26 1 Paolo Manghi
27 10 Paolo Manghi
According to the CERIF's data model vision: (i) “horizontal” classification of entities (e.g., by vocabularies of terms) is not modeled through properties associated to given controlled vocabularies and (ii) semantic relationships between entities are not modeled by adding dedicated relationships. In both cases, CERIF introduces a flexible modeling mechanism which allows injecting classification semantics into “semantics-agnostic” entities and relationships. The mechanism is obtained by introducing two entities Schemes and Classes such that:
28
29 1 Paolo Manghi
* *Class* A Class represents one term of a classification, e.g., vocabulary, taxonomy, under a given Scheme. As such it is characterized by the following properties: a Code, which represents the persistent identifier associated to the term (e.g., real-world classifications, such as ISO vocabularies for countries, have a standard identification code for terms), a name, an acronym, a description, a StartDate, and an EndDate. 
30
* *Scheme* A Scheme identifies the existence of a classification scheme, which is modeled as a set of Class objects. A Scheme is characterized by the following properties: a Code, which represents the persistent identifier associated to the Scheme (e.g., real-world schemes, such as taxonomies, may be have a standard identification code), a name, an acronym, a description, a StartDate, and an EndDate. 
31
32 32 Alessia Bardi
According to the CERIF's definition, Classes and Schemes can be themselves interlinked to form arbitrary complex lattices of Classes and Schemes, respectively. 
33
34
*In OpenAIRE we adopt a lighter interpretation*, by introducing the pair Scheme/Class whenever we need to introduce a property of type *[[type_qualifier|Qualifier]]*, i.e. a property whose value comes from a controlled vocabulary, or a relationship between core entities in the model. Such mechanisms allow to flexibly inject relationship semantics and vocabularies into the data model.
35 1 Paolo Manghi
36 20 Alessia Bardi
h2. OpenAIRE entities, relationships and types
37 1 Paolo Manghi
38 20 Alessia Bardi
The entities in the data model belong to the following categories:
39 25 Alessia Bardi
* *[[coreEntities|Core entities]]*: the entities whose information is continuously and incrementally fed to the information space and is of interest to OpenAIRE end-users; namely *[[core_entity_result|Result]]* ([[core_entity_publication|Publication]] and [[core_entity_dataset|Dataset]]), *[[core_entity_person|Person]]*, *[[core_entity_organization|Organization]]*, *[[core_entity_datasource|DataSource]]* (Repository, Dataset Archive, CRIS, Aggregator, Entity Registry), *[[core_entity_project|Projects]]*, *[[core_entity_funder|Funder]]*, *[[core_entity_fundingstream|Funding Stream]]*; 
40 23 Alessia Bardi
* *[[Linking entities]]*: entities used to model relationships, used to connect in a semantic-agnostic way two or more main entities; namely, those denoted by an Entity1_Entity2 notation (see aforementioned CERIF semantic layer).
41 24 Paolo Manghi
* *[[types|Types]]*: types are used to define structured values for entity properties. In fact, structured values do not correspond to objects, i.e. do not have an identity, and cannot be shared by different objects.