The Federated FAIR Data Space

Today’s problem is not about whether we have enough data, but rather if we are able to find, access, aggregate and reuse the vast amount of existing data. This is particularly true when data comes from various scientific domains and communities. Solutions for addressing this problem require a framework for interoperability of data but more importantly, of their description (i.e. metadata). A solution is, therefore, to not only copy the metadata of each repository into a single database but also to make it interoperable with the help of a common enriched description format thus making the data FAIR.

Within EOSC-Pillar, Work Package 5 (WP5) aims to address this major problem by creating an innovative solution called the Federated FAIR Data Space (F2DS). This solution will enable easy alignment of the metadata to provide a unified search and data retrieval over multiple data sets from heterogeneous and distributed community-specific repositories.

The architecture of the F2DS has been designed and the first bundle of components and tools has been released internally during the first year of the project. The proposed solution is built by integrating existing and state of the art tools supporting the data FAIRification process. Our goal is to create an innovative federated data space which allows data repository owners to easily register their repository, FAIRify their data descriptions and subsequently publish and share FAIR datasets. Our approach offers a generic framework that can be easily deployed and reused for many different purposes (e.g. share between project partners, create a specific F2DS for COVID-19). To achieve this objective, we take advantage of professional devOps tools (containers and Kubernetes) to ease deployment and management on any infrastructure and for any purpose.

The F2DS solution offers on one side a graphical interface allowing data repository owners to easily publish their datasets by mapping their metadata with the DCAT standard and with semantic artefacts (ontologies, thesauri, controlled vocabularies, etc.). The goal is to allow data producers to publish their datasets in the simplest and most automated way possible without having to make major adaptations to their data repository. The main requirement is that data repositories offer an extensive RESTful API. To register data resources, we designed an initial workflow for publishing data sets based on the following steps:

  • description of the repository,
  • description of the APIs for accessing metadata,
  • description of the available datasets, and
  • enrichment of the metadata with semantic artefacts.

Data owners can export and reuse the information provided during these various steps to add their repository in another F2DS.

On the other side, the F2DS will offer a unique search interface for researchers. For this purpose, the F2DS is built with needs in mind of the various EOSC-Pillar use-cases which cover a wide range of scientific domains (marine biology, earth science, humanities, cultural heritage, agronomy and biodiversity, biomedicine…). Once populated, the EOSC-Pillar F2DS will allow researchers from the different communities to have access to a large multi-disciplinary set of datasets and address transversal scientific questions by connecting the F2DS to specific tools such as workflow engines and VREs (e.g. D4Science, Jupyter notebooks).

In addition to the development of the F2DS, WP5 is in the process of identifying and collating ontologies from the various communities involved in the project. This exercise will enable the enrichment of metadata into the Pillar F2DS. To support this activity, a survey on the semantic landscape has started. The goal is to gather and aggregate the ontologies in use to enable the semantic alignment of metadata and contribute towards harmonizing the interfaces and ease of use of the F2DS.

While further development of the integration is progressing we aim to release a first test version for within 6 months. At the end of the project, the F2DS will be offered in the EOSC-Pillar service portfolio and will be available for easy deployment for all those who would like to create their own F2DS to suit their particular needs.

