Leveraging the experience of Software Heritage, this task aims at designing and pilot a solution for the preservation of massive collections of software source code (billions of source code files with links to publications) into EOSC eTDR service (European Trusted Digital Repository). More specifically, as part of this use case EOSC/Pillar will:
A large part of the technical and scientific knowledge that is being developed today resides in software. The preservation of this universal body of knowledge has become as essential as preserving research articles and data sets. Software preservation is a pillar of reproducibility, because software is used in essential ways during all phases of research in all fields of science. To be able to reproduce an experiment, knowing the exact version of the software used is essential.
Software Heritage will ensure availability and traceability of software, providing the missing vertex in the triangle of scientific preservation, together with open data and open access.
The first addressed challenge is to offer the research community a way to persistently and uniquely identify any piece of software source code. The second addressed challenge is to allow EOSC members to archive source code artefacts in the long term, thus helping reproducibility.
Long-term preservation guarantees will be achieved by replication, encompassing the main Software Heritage archive, its network of mirrors, and by depositing a copy of the Software Heritage Archive in the CINES long-term archiving solution (Vitam).
Thanks to EOSC-Pillar, users of the archival services offered by Softare Heritage will benefit from integration with other common services and infrastructure offered by EOSC-Pillar.
One example of this is authentication and authorization (AAI). There will be no need to create dedicated accounts on the Software Heritage infrastructure to access the provided APIs: any identity provider integrated into EOSC-Pillar could be used.
Second, thanks to ESOC-Pillar Software Heritage got access to collaboration opportunities with partners interested in replicating the archive and their infrastructure, such as CINES with Vitam. In the future, other interested partners will be able to do the same, building on top the accrued experience.
For the future it will also become possible to partner with providers of computing resources, enabling research groups to conduct massive large-scale experiments on the source code artifacts archived by Software Heritage.