Biodiversity Information Science and Standards : Conference Abstract
|
Corresponding author: Peter Grobe (p.grobe@leibniz-zfmk.de)
Received: 17 Jun 2019 | Published: 26 Jun 2019
© 2019 Peter Grobe, Roman Baum, Philipp Bhatty, Christian Köhler, Sandra Meid, Björn Quast, Lars Vogt
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation: Grobe P, Baum R, Bhatty P, Köhler C, Meid S, Quast B, Vogt L (2019) From Data to Knowledge: A semantic knowledge graph application for curating specimen data. Biodiversity Information Science and Standards 3: e37412. https://doi.org/10.3897/biss.3.37412
|
The landscape of currently existing repositories of specimen data consists of isolated islands, with each applying its own underlying data model. Using standardized protocols such as DarwinCore or ABCD, specimen data and metadata are exchanged and published on web portals such as GBIF.
However, data models differ across repositories. This can lead to problems when comparing and integrating content from different systems. for example, in one system there is a field with the label 'determination', in another there is a field with the label 'taxonomic identification'. Both might refer to the same concepts of organism identification process (e.g., 'obi:organism identification assay'; http://purl.obolibrary.org/obo/OBI_0001624), but the intuitive meaning of the content is not clear and the understanding of the providers of the information might differ from that of the users. Without additional information, data integration across isolated repositories is thus difficult and error-prone. As a consequence, interoperability and retrievability of data across isolated repositories is difficult.
Linked Open Data (LOD) promises an improvement. URIs can be used for concepts that are ideally created and accepted by a community and that provide machine-readable meanings. LOD thereby supports transfer of data into information and then into knowledge, thus making the data FAIR (Findable, Accessible, Interoperable, Reusable;
Storing specimen collection data as semantic Knowledge Graphs provides semantic transparency and machine-readability of data. Semantic Knowledge Graphs are graphs that are based on the syntax of ‘Subject – Property – Object’ of the Resource Description Framework (RDF). The ‘Subject’ and ‘Property’ position is taken by URIs and the ‘Object’ position can be taken either by a URI or by a label or value. Since a given URI can take the ‘Subject’ position in one RDF statement and the ‘Object’ position in another RDF statement, several RDF statements can be connected to form a directed labeled graph, i.e. a semantic graph. Semantic Knowledge Graphs are graphs in which each described specimen and its parts and properties possess their own URI and thus can be individually referenced. These URIs are used to describe the respective specimen and its properties using the RDF syntax. Additional RDF statements specify the ontology class that each part and property instantiates. The reference to the URIs of the instantiated ontology classes guarantees the Findability, Interoperability, and Reusability of information contained in semantic Knowledge Graphs. Specimen collection data contained in semantic Knowledge Graphs can be made Accessible in a human-readable form through an interface and in a machine-readable form through a SPARQL endpoint (https://en.wikipedia.org/wiki/SPARQL). As a consequence, semantic Knowledge Graphs comply with the FAIR guiding principles. By using URIs for the semantic Knowledge Graph of each specimen in the collection, it is also available as LOD.
With semantic Morph·D·Base, we have implemented a prototype to this approach that is based on Semantic Programming. We present the prototype and discuss different aspects of how specimen collection data are handled. By using community created terminologies and standardized methods for the contents created (e.g. species identification) as well as URIs for each expression, we make the data and metadata semantically transparent and communicable.
The source code for Semantic Programming and for semantic Morph·D·Base is available from https://github.com/SemanticProgramming. The prototype of semantic Morph·D·Base can be accessed here: https://proto.morphdbase.de.
semantic programming; FAIR data principle; knowledge graph application; collection data
Peter Grobe
Biodiversity_Next 2019
DFG LIS: Information Infrastructures for Research Data (http://gepris.dfg.de/gepris/projekt/248394582)
eScience-Compliant Standards for Morphology