Core Data Model » History » Version 32
  Alessia Bardi, 21/07/2015 03:54 PM 
  highithing the fact that OpenAIRE adopts a lighter interpretation of class/scheme than CERIF
| 1 | 1 | Paolo Manghi | h1. OpenAIRE core data model | 
|---|---|---|---|
| 2 | |||
| 3 | 3 | Paolo Manghi | h2. General concepts | 
| 4 | |||
| 5 | 6 | Claudio Atzori | The main entities of the OpenAIRE information space are: *datasets*, *publications*, *persons*, *organisations*, *funders*, *funding streams*, *projects*, and *data sources*. | 
| 6 | 1 | Paolo Manghi | |
| 7 | 31 | Alessia Bardi | p=. !{width:65%}openaireCoreModel-2.png! | 
| 8 | 28 | Alessia Bardi | _Figure 1 OpenAIRE Data Model: core entities and relationships._ | 
| 9 | 27 | Alessia Bardi | |
| 10 | 1 | Paolo Manghi | In our reasoning we generalize the concept of *datasets* and *publications* to that of project *result*, so as to be able of including further kinds of research outputs. OpenAIRE initially proposes two kinds of results: datasets (e.g., experimental data, software products) and publications. But others can be added in the future (e.g., patents). Besides, project results are always associated to one or more instances of the results, in the sense that different “physical representations” of the same result may exist. For example, the same publication may be kept in two different repositories, both exposing the payload file (e.g., PDF) at different internet locations (URLs). Morover, an instance of a result is represented as a combination of one or more web resources relative to the sub-parts of the result and of the internet data sources from which such resources are made available. | 
| 11 | |||
| 12 | 26 | Alessia Bardi | Similarly, the notion of authors of publications or datasets is extended to that of *persons*, to include in the same set people connected to project fundings or organizations. For example “authorship” relationships between results and persons, which represent the fact that a given person has (co-)authored a given result while being affiliated with a given organization. | 
| 13 | 1 | Paolo Manghi | |
| 14 | *Organizations* include companies, research centers or institutions involved as project partners or as responsible of operating data sources. Information about organizations will be initially collected from CORDA and CRIS systems, as being related to projects, or be ingested by users, for example to complete authorships information in the database. | ||
| 15 | |||
| 16 | Of crucial interest to OpenAIRE is also the identification of the *funders* (e.g. European Commission, WellcomeTrust, FCT Portugal, NWO The Netherlands) which co-funded the *projects* that have led to a given result. Funders can be associated to a list of *funding streams* (e.g. FP7 for the EC), which identify the strands of fundings comprised by the funding stream. Funding streams can be nested to for a tree of subfunding streams, and projects are typically associated to the fudnding stream “leaves” of such trees. | ||
| 17 | |||
| 18 | 9 | Paolo Manghi | Finally, OpenAIRE entity instances are created out of data collected from various *data sources* of different kinds, such as publication repositories, dataset archives, CRIS systems, etc. Data sources export information packages (e.g., XML records, HTTP responses, RDF data, JSON) that may contain information on one or more of such entities and possibly relationships between them. It is important, once each piece of information is extracted from such packages and inserted into the information space as an entity, for such pieces to keep provenance information relative to the originating data source. This is to give visibility to the data source, but also to enable the reconstruction of very the same piece of information if problems arise. | 
| 19 | 27 | Alessia Bardi | |
| 20 | 29 | Alessia Bardi | p=. !{width:50%}provenance.jpg! | 
| 21 | 28 | Alessia Bardi | _Figure 2 OpenAIRE Data Model: core entities and provenance information._ | 
| 22 | 1 | Paolo Manghi | |
| 23 | 20 | Alessia Bardi | h2. OpenAIRE and the CERIF semantic layer | 
| 24 | 1 | Paolo Manghi | |
| 25 | 10 | Paolo Manghi | For more check *CERIF's web site*: http://www.eurocris.org | 
| 26 | 1 | Paolo Manghi | |
| 27 | 10 | Paolo Manghi | According to the CERIF's data model vision: (i) “horizontal” classification of entities (e.g., by vocabularies of terms) is not modeled through properties associated to given controlled vocabularies and (ii) semantic relationships between entities are not modeled by adding dedicated relationships. In both cases, CERIF introduces a flexible modeling mechanism which allows injecting classification semantics into “semantics-agnostic” entities and relationships. The mechanism is obtained by introducing two entities Schemes and Classes such that: | 
| 28 | |||
| 29 | 1 | Paolo Manghi | * *Class* A Class represents one term of a classification, e.g., vocabulary, taxonomy, under a given Scheme. As such it is characterized by the following properties: a Code, which represents the persistent identifier associated to the term (e.g., real-world classifications, such as ISO vocabularies for countries, have a standard identification code for terms), a name, an acronym, a description, a StartDate, and an EndDate. | 
| 30 | * *Scheme* A Scheme identifies the existence of a classification scheme, which is modeled as a set of Class objects. A Scheme is characterized by the following properties: a Code, which represents the persistent identifier associated to the Scheme (e.g., real-world schemes, such as taxonomies, may be have a standard identification code), a name, an acronym, a description, a StartDate, and an EndDate. | ||
| 31 | |||
| 32 | 32 | Alessia Bardi | According to the CERIF's definition, Classes and Schemes can be themselves interlinked to form arbitrary complex lattices of Classes and Schemes, respectively. | 
| 33 | |||
| 34 | *In OpenAIRE we adopt a lighter interpretation*, by introducing the pair Scheme/Class whenever we need to introduce a property of type *[[type_qualifier|Qualifier]]*, i.e. a property whose value comes from a controlled vocabulary, or a relationship between core entities in the model. Such mechanisms allow to flexibly inject relationship semantics and vocabularies into the data model. | ||
| 35 | 1 | Paolo Manghi | |
| 36 | 20 | Alessia Bardi | h2. OpenAIRE entities, relationships and types | 
| 37 | 1 | Paolo Manghi | |
| 38 | 20 | Alessia Bardi | The entities in the data model belong to the following categories: | 
| 39 | 25 | Alessia Bardi | * *[[coreEntities|Core entities]]*: the entities whose information is continuously and incrementally fed to the information space and is of interest to OpenAIRE end-users; namely *[[core_entity_result|Result]]* ([[core_entity_publication|Publication]] and [[core_entity_dataset|Dataset]]), *[[core_entity_person|Person]]*, *[[core_entity_organization|Organization]]*, *[[core_entity_datasource|DataSource]]* (Repository, Dataset Archive, CRIS, Aggregator, Entity Registry), *[[core_entity_project|Projects]]*, *[[core_entity_funder|Funder]]*, *[[core_entity_fundingstream|Funding Stream]]*; | 
| 40 | 23 | Alessia Bardi | * *[[Linking entities]]*: entities used to model relationships, used to connect in a semantic-agnostic way two or more main entities; namely, those denoted by an Entity1_Entity2 notation (see aforementioned CERIF semantic layer). | 
| 41 | 24 | Paolo Manghi | * *[[types|Types]]*: types are used to define structured values for entity properties. In fact, structured values do not correspond to objects, i.e. do not have an identity, and cannot be shared by different objects. |