Heritage Sciences, i.e. the application of scientific experimental methods to the analysis of cultural heritage artefacts, produces a large quantity of numeric data that are only loosely related to the cultural object to which the analyses were applied. The lack of standard data models for the different technologies employed makes interoperability between datasets almost impossible. On the other hand, the same cultural objects and activities on them (studies, interventions, etc.) are documented in textual documents usually with very basic metadata.
This situation requires the intervention of a human to link the documentation of scientific analyses to the documentation of the cultural object, e.g. chemical analyses and physical to a study by an art historian; this in the end prevents data re-use and data-driven research.
The creation of a Linked Data system, as outlined above, requires the creation of a common semantic model covering both the scientific data and the content of text reports. This has been created as an application profile of CIDOC CRM, the standard ontology used for cultural heritage documentation, based on previous extensions such as CRM-PE, developed within the PARTHENOS project, and CRM-SCI.
Based on this newly-created schema, the data encoding may be obtained by uploading the numeric results of the scientific experiments together with their metadata to create the scientific side; and then uploading the text data annotated with the same ontology. The system will then create the links between the two components of the documentation system. However, the manual annotation of texts is cumbersome and time consuming, so a text mining tool based on machine learning to enrich the metadata of the texts which may be linked to the metadata accompanying the digital outcomes of the scientific analyses.
Sharing the tool and supporting the integration of scientific data with heritage reports in a cloud environment may foster cross-fertilisation between heritage professionals and scientists with data scientists. It may be a significant step forward in the digital transformation of cultural heritage and - at least - push forward the digitisation of heritage assets and their documentation, by showcasing the new opportunities opened by such integration.
The system is fully working and has been tested on a set of archaeological reports/scientific data. To train the NER tool, reports were manually encoded by experts, and then the automatic annotation was tested on others, working pretty well. A merger set of training texts would be necessary to extend the usability to there fields or languages.