Integration of data repositories into EOSC based on communities approaches

Introduction

The agriculture, food and environment research community faces many challenges common to all: Easily find and publish data, preserve them, and facilitate their treatment and analyse through computing solutions.

Examples of needs:

“I need to integrate innovative services that allow researchers to analyse data and publish it easily. Our information system must guarantee long-term storage of experimental data” - Engineer, Phenome-Emphasis community

“I need an easy access to the data produced on the experimental platforms. I need to correlate various datasets. I need an easy access to publish curated datasets or models.” - Research Engineer, Phenome-Emphasis community

To address these needs, this use case aims to create a flexible federated research data ecosystem for the agrifood community through four aspects:

long term data preservation,
connecting data repositories,
virtual research environments,
cloud computing.

Challenges addressed

Data findability and reusability
Data integration
Data processing
Reproducibility

Benefits through EOSC-Pillar

By using the EOSC-Pillar Federated FAIR Data Space (F2DS), data providers and repositories will be able to make their data findable, accessible, and reusable by the whole community within the context of EOSC. As a direct consequence, this task will enable and/or increase interoperability among the repositories. Furthermore, the use case will leverage EOSC data services such as B2SAFE in order to implement long-term preservation of the institutional data repositories.

With EOSC distribution, the agrifood community as a whole will gain access to a research environment on which to process, analyse and visualise data in-situ with appropriate compute infrastructure, without the need to download them first, fostering collaborations and cross-fertilisation.

Highlights

What has been achieved

Widening the agrifood initial scope through collaboration with partners.
Deployment of a Virtual Research Environment in D4Science.
Deployment of OpenStack over INRAE’s and France Grille’s infrastructures.
Provisioning of Kubernetes Clusters based on INRAE’s and France Grille’s infrastructures.
Deployment of JupyterHub on INRAE’s infrastructures.
Deployment of renku on INRAE ’s infrastructures.
Deployment of binder on INRAE’s infrastructures.
Mapping between CINES archiving tool ( VITAM) and DataINRAE metadata.
Creation of Data INRAE openAPI definition for help integrating dataverse based repositores data in Federated FAIR Data Space (F2DS).
Development of a connector between dataverse based data repository and D4Science VRE.
Development of a connector between dataverse based data repository and JupyterHub.
Setup a connector between CINES archiving tool (VITAM) and Data INRAE for archiving automaticaly Data INRAE’s data in VITAM.
Connection between INRAE jupyterHub and D4Science platform as a jupyter notebook provider.

Next steps

Developing a connector between dataverse and renku.
Developing a connector between dataverse and binder.
Setup interoperability between Fraunhofer Marketplace and F2DS.
Setup interoperability between Fraunhofer Marketplace and dataverse based repository.