Json schema » History » Revision 24

« Previous | Revision 24/26 (diff) | Next »
Miriam Baglioni, 23/12/2021 10:34 AM

Json schema¶

Table of contents
Json schema
- Dump data model overview
  - Table of main entities
  - Table of the relationships

The latest version of the json schema is available at https://doi.org/10.5281/zenodo.4238938.
For a visual and interactive view of the schema, we suggest to use a json schema viewer like https://navneethg.github.io/jsonschemaviewer/ (you just need to copy the schema and then you can easily navigate through nodes).

TODO

Drawing of the schema/data model
data model
entities
attributes
Brief description for each and for the non trivial cases, the processes that affect its value
the title of a publication comes as is from the source. No need to declare that anywhere
the funder of the publication comes either from the source or is inferred. This we must document
the refereed field is constructed with some methodology. This we must document

Dump data model overview¶

Table of main entities¶

#	Entity type	Sub-types	Description
1	Result		Results are intended as digital objects, described by metadata, resulting from a scientific process
1.1		Publication	Publications includes all digital research artefacts whose intended use is narrative storytelling of a research activity and its results. Examples are scientific articles, reports, slides, data papers, etc. Although there are exceptions, as each scientist has a large degree of freedom in publishing and interlinking his artefacts, it can be generally assumed that literature artefacts are published with narrative intent. For those specific cases where literature is intended for different use, we in general do not expect scientists to publish such artefacts as literature artefacts. For example when an article is a carrier of readable datasets (e.g. articles with tables) the article is often deposited a second time in a data repository, assigned a new DOI, and marked as a dataset of type “textual”; in the case articles full-texts are used for natural language processing (NLP), scientists will likely create a dataset of type “collection of articles”.
1.2		Dataset	include digital research artefacts encoding experimental or real-world observations/measures (e.g. primary data), secondary data derived from programmatic processing of other datasets, or more generally digital representations of facts to be interpreted by a program. The definition is cross-discipline, hence spans across multiple interpretations of datasets, where typologies and granularity obey to different scientific facets. Examples include, but are not limited to: databases (e.g. Worms), records of databases (e.g. proteins in the UniProt database), table files, queries over databases (time-series slices, geospatial maps, SQL queries), media (e.g. images, videos) or collections of media.
1.3		Software	Software entities represent research software, i.e. software that is an output of research activity. Examples include, but are not limited to: code scripts, web services, and web applications.
1.4		Other Research Product	Other research products include any research output that is not literature, data, or software. Examples include, but are not limited to: algorithms, scientific workflows/pipelines, protocols, standard operating procedure (SOP), simulations, mathematical and statistical models, but also research packages. Research packages can group a set of research artefacts, but can also include the encoding of a composition logic that binds them together. For example, an instance of a workflow is a package that describes the combination of specific artefacts to implement a scientific process, execute an experiment, etc.
2	Data source		OpenAIRE entity instances are created out of data collected from various data sources of different kinds, such as publication repositories, dataset archives, CRIS systems, funder databases, etc. Data sources export information packages (e.g., XML records, HTTP responses, RDF data, JSON) that may contain information on one or more of such entities and possibly relationships between them. For example, a metadata record about a project carries information for the creation of a Project entity and its participants (as Organization entities). It is important, once each piece of information is extracted from such packages and inserted into the OpenAIRE information space as an entity, for such pieces to keep provenance information relative to the originating data source. This is to give visibility to the data source, but also to enable the reconstruction of the very same piece of information if problems arise.
3	Organization		Organizations include companies, research centers or institutions involved as project partners or as responsible of operating data sources. Information about organizations are collected from funder databases like CORDA, registries of data sources like OpenDOAR and re3Data, and CRIS systems, as being related to projects or data sources.
4	Project		Of crucial interest to OpenAIRE is also the identification of the funders (e.g. European Commission, WellcomeTrust, FCT Portugal, NWO The Netherlands) that co-funded the projects that have led to a given result. Projects are characterized by a list of funding streams (e.g. FP7, H2020 for the EC), which identify the strands of fundings. Funding streams can be nested to form a tree of sub-funding streams.
5	Community/Initiative		Communities/Initiatives are intended as groups of people with a common research intent and can be of two types: research initiatives or research communities. 1. Research initiatives are intended to capture a view of the information space that is "research impact"-oriented, i.e. all products generated due to my research initiative; 2. Research communities the latter “research activity” oriented, i.e. all products that may be of interest or related to my research initiative. For example, the organizations supporting a research infrastructure fall in the first category, while the researchers involved in a discipline fall in the second.

Table of the relationships¶

A relationship in the graph is represented by the following data type, which aims to model a directed edge between two nodes, providing information about the semantic of the relation, its provenance and validation.

field name	cardinality	type	description
1	source	ONE	Node	Represents the source node in the relation
2	target	ONE	Node	Represents the target node in the relation
3	reltype	ONE	RelType	Represent the semantics of the relation between two nodes of the graph
4	provenance	ONE	Provenance	Indicates the process that produced (or provided) the information
5	validated	ONE	boolean	Indicates weather or not the relation was validated
6	validationDate	ONE	string	Indicates the validation date of the relation - applies only when the validated flag is set to true

Node¶

The Node data type contains the minimum information needed to identify a graph node, its identifier and entity type.

field name	cardinality	type	description
1	id	ONE	string	OpenAIRE identifier of the node in the graph
2	type	ONE	string	graph node type

RelType¶

The RelType data type models the semantic of the relationship among two nodes.

field name	cardinality	type	description
1	type	ONE	string	relation category, e.g. affiliation, citation, see table Relation typologies
2	name	ONE	string	further specifies the relation semantic, indicating the relation direction, e.g. Cites, isCitedBy

Relation typologies¶

The following table lists all the possible relation semantics found in the graph dump.

#	source entity type	target entity type	relType.type	relType.name	relType.name (inverse)
1	Project	Result	outcome	produces	isProducedBy
2	Result	Organization	affiliation	hasAuthorInstitution	isAuthorInstitutionOf
3	Result	Result	similarity	IsAmongTopNSimilarDocuments	HasAmongTopNSimilarDocuments
4	Project	Organization	participation	isParticipant	hasParticipant
5	Result	Result	supplement	IsSupplementTo	IsSupplementedBy
6	Result	Result	relationship	IsRelatedTo	IsRelatedTo
7	Data_source	Organization	provision	provides	isProvidedBy
8	Result	Data_source	provision	IsHostedBy	hosts
9	Result	Data_source	provision	IsProvidedBy	provides
10	Result	CommunityInitiative	relationship	IsRelatedTo	IsRelatedTo
11	Organization	CommunityInitiative	relationship	IsRelatedTo	IsRelatedTo
12	Data_source	CommunityInitiative	relationship	IsRelatedTo	IsRelatedTo
13	Project	CommunityInitiative	relationship	IsRelatedTo	IsRelatedTo

Further releases will extend the set of relationship types exported in the graph dump. The candidate relationships are indicated in the following table:

#	source entity type	target entity type	relType.type	relType.name	relType.name (inverse)
1	Result	Result	relationship	IsReferencedBy	References
2	Result	Result	citation	Cites	IsCitedBy
3	Result	Result	part	HasPart	IsPartOf
4	Result	Result	version	IsPreviousVersionOf	IsNewVersionOf
5	Result	Result	relationship	Continues	IsContinuedBy
6	Result	Result	version	IsVersionOf	HasVersion
7	Result	Result	relationship	IsIdenticalTo	IsIdenticalTo
8	Result	Result	relationship	Documents	IsDocumentedBy
9	Result	Result	relationship	IsDerivedFrom	IsSourceOf
10	Result	Result	version	IsOriginalFormOf	IsVariantFormOf
11	Result	Result	version	Obsoletes	IsObsoletedBy
12	Result	Result	review	Reviews	IsReviewedBy
13	Result	Result	relationship	Compiles	IsCompiledBy

Provenance¶

The Provenance data type indicates the process that produced (or provided) the information, and the trust associated to the information.

field name	cardinality	type	description
1	provenance	ONE	string	provenance, contains values defined according to the `dnet:provenanceAction` vocabulary https://api.openaire.eu/vocabularies/dnet:provenanceActions
2	trust	ONE	string	trust, expressed as a number in the range [0-1] indicates the trustworthiness of the information.

Files (0)

Updated by Miriam Baglioni over 4 years ago · 24 revisions

Project

General

Profile

Documentation

Wiki