Json schema » History » Revision 12
Revision 11 (Claudio Atzori, 10/11/2021 12:28 PM) → Revision 12/26 (Claudio Atzori, 10/11/2021 12:43 PM)
h1. Json schema
The latest version of the json schema is available at https://doi.org/10.5281/zenodo.4238938.
For a visual and interactive view of the schema, we suggest to use a json schema viewer like https://navneethg.github.io/jsonschemaviewer/ (you just need to copy the schema and then you can easily navigate through nodes).
TODO
* Drawing of the schema/data model
* data model
entities
attributes
Brief description for each and for the non trivial cases, the processes that affect its value
* the title of a publication comes as is from the source. No need to declare that anywhere
* the funder of the publication comes either from the source or is inferred. This we must document
* the refereed field is constructed with some methodology. This we must document
h2. Dump data model overview
h3. Table of main entities
|_. # |_. Entity type |_. Sub-types |_. Description |
|1 | *Result* | | Results are intended as digital objects, described by metadata, resulting from a scientific process |
|1.1 | | *Publication* | Publications includes all digital research artefacts whose intended use is narrative storytelling of a research activity and its results. Examples are scientific articles, reports, slides, data papers, etc. Although there are exceptions, as each scientist has a large degree of freedom in publishing and interlinking his artefacts, it can be generally assumed that literature artefacts are published with narrative intent. For those specific cases where literature is intended for different use, we in general do not expect scientists to publish such artefacts as literature artefacts. For example when an article is a carrier of readable datasets (e.g. articles with tables) the article is often deposited a second time in a data repository, assigned a new DOI, and marked as a dataset of type “textual”; in the case articles full-texts are used for natural language processing (NLP), scientists will likely create a dataset of type “collection of articles”. |
|1.2 | | *Dataset* | include digital research artefacts encoding experimental or real-world observations/measures (e.g. primary data), secondary data derived from programmatic processing of other datasets, or more generally digital representations of facts to be interpreted by a program. The definition is cross-discipline, hence spans across multiple interpretations of datasets, where typologies and granularity obey to different scientific facets. Examples include, but are not limited to: databases (e.g. Worms), records of databases (e.g. proteins in the UniProt database), table files, queries over databases (time-series slices, geospatial maps, SQL queries), media (e.g. images, videos) or collections of media. |
|1.3 | | *Software* | Software entities represent research software, i.e. software that is an output of research activity. Examples include, but are not limited to: code scripts, web services, and web applications. |
|1.4 | | *Other Research Product* | Other research products include any research output that is not literature, data, or software. Examples include, but are not limited to: algorithms, scientific workflows/pipelines, protocols, standard operating procedure (SOP), simulations, mathematical and statistical models, but also research packages. Research packages can group a set of research artefacts, but can also include the encoding of a composition logic that binds them together. For example, an instance of a workflow is a package that describes the combination of specific artefacts to implement a scientific process, execute an experiment, etc. |
|2 | *Data source* | | OpenAIRE entity instances are created out of data collected from various data sources of different kinds, such as publication repositories, dataset archives, CRIS systems, funder databases, etc. Data sources export information packages (e.g., XML records, HTTP responses, RDF data, JSON) that may contain information on one or more of such entities and possibly relationships between them. For example, a metadata record about a project carries information for the creation of a Project entity and its participants (as Organization entities). It is important, once each piece of information is extracted from such packages and inserted into the OpenAIRE information space as an entity, for such pieces to keep provenance information relative to the originating data source. This is to give visibility to the data source, but also to enable the reconstruction of the very same piece of information if problems arise. |
|3 | *Organization* | | Organizations include companies, research centers or institutions involved as project partners or as responsible of operating data sources. Information about organizations are collected from funder databases like CORDA, registries of data sources like OpenDOAR and re3Data, and CRIS systems, as being related to projects or data sources. |
|4 | *Project* | | Of crucial interest to OpenAIRE is also the identification of the funders (e.g. European Commission, WellcomeTrust, FCT Portugal, NWO The Netherlands) that co-funded the projects that have led to a given result. Projects are characterized by a list of funding streams (e.g. FP7, H2020 for the EC), which identify the strands of fundings. Funding streams can be nested to form a tree of sub-funding streams. |
|5 | *Community/Initiative* | | Communities/Initiatives are intended as groups of people with a common research intent and can be of two types: research initiatives or research communities.
1. Research initiatives are intended to capture a view of the information space that is "research impact"-oriented, i.e. all products generated due to my research initiative;
2. Research communities the latter “research activity” oriented, i.e. all products that may be of interest or related to my research initiative.
For example, the organizations supporting a research infrastructure fall in the first category, while the researchers involved in a discipline fall in the second. |
h3. Result
|_. field name |_. cardinality |_. type |_. description |
| id | ONE | string | Main entity identifier, created according to [[OpenAIRE_entity_identifier_and_PID_mapping_policy]] |
| type | ONE | string | Type of the result: one of 'publication', 'dataset', 'software', 'other' as declared in the terms from the "dnet:result_typologies":https://api.openaire.eu/vocabularies/dnet:result_typologies vocabulary |
| originalId | MANY | string | Identifiers of the record at the original sources |
| maintitle | ONE | string | A name or title by which a scientific result is known. May be the title of a publication, of a dataset or the name of a piece of software. |
| subtitle | ONE | string | Explanatory or alternative name by which a scientific result is known. |
| author | MANY | Author | The main researchers involved in producing the data, or the authors of the publication |
| bestaccessright | ONE | AccessRight (should be changed as it must NOT include the openaccessroute) | The most open access right associated to the manifestations of this research results. |
| contributor | MANY | string | The institution or person responsible for collecting, managing, distributing, or otherwise contributing to the development of the resource. |
| country | MANY | Country | Country associated with the result. TODO: explain or link why |
| coverage | MANY | string | ??? |
| dateofcollection | ONE | string | When OpenAIRE collected the record the last time. TODO: we should indicate the used date format |
| description | MANY | string | A brief description of the resource and the context in which the resource was created. |
| embargoenddate | ONE | string | Date when the embargo ends and this result turns Open Access. TODO: we should indicate the used date format |
| instance | MANY | [[Instance]] | Specific materialization or version of the result. For example, you can have one result with three instances: one is the pre-print, one is the post-print, one is the published version |
| language | ONE | Object(code/label) | code: alpha-3/ISO 639-2 code of the language. Label: Language label in English. Values controlled by the "dnet:languages":https://api.openaire.eu/vocabularies/dnet:languages vocabulary |
| lastupdatetimestamp | ONE | long | Timestamp of last update of the record in OpenAIRE |
| pid | MANY | Pid | Persistent identifiers of the result |
| publicationdate | ONE | string | Main date of the research product: typically the publication or issued date. In case of a research result with different versions with different dates, the date of the result is selected as the most frequent well-formatted date. If not available, then the most recent and complete date among those that are well-formatted. For statistics, the year is extracted and the result is counted only among the result of that year. Example: Pre-print date: 2019-02-03, Article date provided by repository: 2020-02, Article date provided by Crossref: 2020, OpenAIRE will set as date 2019-02-03, because it’s the most recent among the complete and well-formed dates. If then the repository updates the metadata and set a complete date (e.g. 2020-02-12), then this will be the new date for the result because it becomes the most recent most complete date. However, if OpenAIRE then collects the pre-print from another repository with date 2019-02-03, then this will be the “winning date” because it becomes the most frequent well-formatted date. |
| publisher | ONE | string | The name of the entity that holds, archives, publishes prints, distributes, releases, issues, or produces the resource. |
| source | MANY | string | A related resource from which the described resource is derived. See definition of "Dublin Core field dc:source":https://www.dublincore.org/specifications/dublin-core/dcmi-terms/elements11/source |
| subjects | MANY | Subject | Subject, keyword, classification code, or key phrase describing the resource. |