Synospecies, a Linked Data Application to Explore Taxonomic Names

Reto Gmür; Donat Agosti; Guido Sautter

doi:10.3897/biss.6.93707

Biodiversity Information Science and Standards : Conference Abstract

PDF

Conference Abstract

Synospecies, a Linked Data Application to Explore Taxonomic Names

Reto Gmür^‡, Donat Agosti^‡, Guido Sautter^‡

‡ Plazi, Bern, Switzerland

Corresponding author: Donat Agosti (agosti@plazi.org)

Received: 19 Aug 2022 | Published: 23 Aug 2022

This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Citation: Gmür R, Agosti D, Sautter G (2022) Synospecies, a Linked Data Application to Explore Taxonomic Names. Biodiversity Information Science and Standards 6: e93707. https://doi.org/10.3897/biss.6.93707

Abstract

Synospecies is a linked data application to explore changes in taxonomic names (Gmür and Agosti 2021). The underlying source of truth for the establishment of taxa, the assignment and re-assignment of names, are taxonomic treatments. Taxonomic treatments are sections of publications documenting the features or distribution of taxa in ways adhering to highly formalized conventions, and published in scientific journals, which shape our understanding of global biodiversity (Catapano 2010). Plazi, a not-for-profit organization dedicated to liberating knowledge, extracts the relevant information from these treatments and makes it publicly available in digital form. Depending on the original form of a publication, a treatment undergoes several steps during its processing. All these steps affect the available digital artifacts extracted from the treatment's original publication. The treatments are digitalized, the text is annotated with a specialized editor, and cross-referenced and enhanced with other sources (Agosti and Sautter 2018). After these steps, the annotated text is transformed to the different structured data-formats used by other digital biodiversity platforms (e.g., Global Biodiversity Information Facility: Plazi.org taxonomic treatment database using Darwin Core Archive, generic linked data tools (e.g. lod view; RDF2h Browser) and other consuming applications (e.g Ocellus via Zenodeo using XML; openBioDiv using XML; HMW using XML; Biotic interaction browser using TaxPub XML; opendata.swiss using RDF) .

While these transformations have been taking place for a long time now, Plazi is now experimenting with making this process more transparent: with the Plazi Actionable Accessible Archive (PAAA) architecture both addition and modification of the digitalized treatments trigger an extensible set of workflows that are immediately executed on the GitHub platform. Not only is the exact definition and code of every workflow publicly accessible, but the results, errors and execution time of every single workflow is accessible as well. This offers an unprecedented degree of transparency and flexibility in the data processing that we have prototypically implemented for the creation of the RDF data used by Synospecies. As with the W3C GRDDL recommendation (https://www.w3.org/TR/grddl/) XSLT is used to transform XML to RDF/XML, a concrete syntax of the early days of RDF still supported by most RDF tools, allowing the data to be read as RDF. The used XSLT document is part of the bundled gg2rdf GitHub action (https://github.com/plazi/gg2rdf) together with the other transformation steps required to generate a transformation result in the both human- and machine-readable RDF Turtle format. On the GitHub Actions page of the treatments-xml repository (https://github.com/plazi/treatments-xml/actions) one can see that every commit to this repository triggers a workflow run that takes approximately 12 minutes to execute. After that the transformation results are available in the treatments-rdf repository (https://github.com/plazi/treatments-rdf/). The commit of RDF data to the treatments-rdf repository triggers a webhook that loads the newly added data to the Plazi triplestore making it virtually immediately available in Synospecies.