Task 8.4 Data flows and dynamics monitoring services

Leader: CNR. Participants: UNIBI

The task includes i) continuous (over-time) data curation and validation activities of the OpenAIRE Information Space, and ii) identification of quality processes as new developments are integrated in the infrastructure. Data curators (UNIBI) will define harmonization and validation mappings for OpenAIRE compatible data sources and make sure the quality of the resulting Information Space will be up to the quality measures. Activities include overtime fine-tuning of de-duplication algorithms for publications, authors, and organizations and of inference algorithms (CNR, ICM, UoA). Monitoring and validation of Information Space quality will be possible thanks to tuning and reporting from the validator service (UoA), the certification of flows and dynamics via services to be developed in WP8-T8.4 (CNR), and both was released. Examples of Information Space certification elements may regard “Expected Information Space variance”: e.g. the percentage of OA vs. non-OA material should not exceed a given ratio; the number of inferred relationships can increase but should not have down-peaks of 10% with respect to previous status; different harvestings of the same data sources should increase in the number of collected objects. Data curators will be able to: (a) configure the services to verify specific data flows and dynamics w.r.t. given measures and thresholds, (b) be consequently notified of misbehaviors or technical issues in order to fix them or, if tolerable, publish the Information Space as is; (c) have access to an history of “status certificates” of current and past Information Spaces. Data curators will also be able to configure and follow the overall monitoring activities via user interfaces, which will offer a graphical representation of the activities. CNR will be in charge of the design and development of the service, while UNIBI will serve the requirements of data curators.

Task Timeline (Including Deliverables & Milestones)

  • D8.4Data flows and dynamics monitoring services. The deliverable will sketch the functional requirements of the services and the intended internal architecture [CNR, R, M11].
  • Milestones
  • M8.3Data flows and dynamics monitoring services.[M12,M24, M36]

Areas of priority (where to concentrate first)

  • Provision area: we will focus on quality indicators exhibited in the “provision area” (namely SOLR index and Redis key-value store). Indicators will be extracted after the provision workflow has ended and such results will be compared against each other (i.e. SOLR vs. Redis) and against their own k-last steps in history.
  • Native area: the monitoring tool will extract useful metrics and indicators about the ingestion and processing of native records as they enter into OpenAIRE infrastructure.
  • Thorough definition of monitoring scenarios, metrics of interest to be extracted and controls to be checked.

Foreseen Integration with other Work Packages and Tasks

  • WP6:
    • Integration of the data flow monitoring framework with the OpenAIRE infrastructure and monitor the production and beta environment and their operation.
    • Continuous monitoring and over-time persistence of metrics and indicator returned by the validator service (metrics yet to be defined).
  • WP9: persistence of metrics and indicators of interest and provision of their history to the OpenAIRE web portal via public API.

Communication Strategy: when and how to raise awareness among consortium of updates in task

  • Portal development team will be informed whenever a new release of the data monitoring framework will be available in order to report meaningful indicators to final users.