Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: Claus Weiland (cweiland@senckenberg.de)
Received: 22 Sep 2021 | Published: 23 Sep 2021
© 2021 Jonas Grieb, Claus Weiland, Alex Hardisty, Wouter Addink, Sharif Islam, Sohaib Younis, Marco Schmidt
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Grieb J, Weiland C, Hardisty A, Addink W, Islam S, Younis S, Schmidt M (2021) Machine Learning as a Service for DiSSCo’s Digital Specimen Architecture. Biodiversity Information Science and Standards 5: e75634. https://doi.org/10.3897/biss.5.75634
|
|
International mass digitization efforts through infrastructures like the European Distributed System of Scientific Collections (DiSSCo), the US resource for Digitization of Biodiversity Collections (iDigBio), the National Specimen Information Infrastructure (NSII) of China, and Australia’s digitization of National Research Collections (NRCA Digital) make geo- and biodiversity specimen data freely, fully and directly accessible.
Complementary, overarching infrastructure initiatives like the European Open Science Cloud (EOSC) were established to enable mutual integration, interoperability and reusability of multidisciplinary data streams including biodiversity, Earth system and life sciences (
Natural Science Collections (NSC) are of particular importance for such multidisciplinary and internationally linked infrastructures, since they provide hard scientific evidence by allowing direct traceability of derived data (e.g., images, sequences, measurements) to physical specimens and material samples in NSC.
To open up the large amounts of trait and habitat data and to link these data to digital resources like sequence databases (e.g., ENA), taxonomic infrastructures (e.g., GBIF) or environmental repositories (e.g., PANGAEA), proper annotation of specimen data with rich (meta)data early in the digitization process is required, next to bridging technologies to facilitate the reuse of these data.
This was addressed in recent studies (
However, such applications of artificial intelligence are rarely—this applies both for (sub-symbolic) machine learning and (symbolic) ontology-based annotations—integrated in the workflows of NSC’s management systems, which are the essential repositories for the aforementioned integration of data streams.
This was the motivation for the development of a Deep Learning-based trait extraction and coherent Digital Specimen (DS) annotation service providing “Machine learning as a Service” (MLaaS) with a special focus on interoperability with the core services of DiSSCo, notably the DS Repository (nsidr.org) and the Specimen Data Refinery (
Taking up the use case to detect and classify regions of interest (ROI) on herbarium scans, we demonstrate a MLaaS prototype for DiSSCo involving the digital object framework, Cordra, for the management of DS as well as instant annotation of digital objects with extracted trait features (and ROIs) based on the DS specification openDS (
Source code available at: https://github.com/jgrieb/plant-detection-service
FAIR Digital Object, Distributed System of Scientific Collections, plant organ detection, deep learning, region-based convolutional neural network, image annotation
Jonas Grieb
TDWG 2021
H2020-INFRADEV-2019-2020 – Grant Agreement No. 871043
Deutsche Forschungsgemeinschaft (DFG) - Project number 316452578