Biodiversity Information Science and Standards : Conference Abstract
PDF
Conference Abstract
Machine Learning as a Service for DiSSCo’s Digital  Specimen Architecture
expand article infoJonas Grieb, Claus Weiland, Alex Hardisty§, Wouter Addink|,, Sharif Islam|,, Sohaib Younis#, Marco Schmidt¤
‡ Senckenberg - Leibniz Institution for Biodiversity and Earth System Research, Frankfurt am Main, Germany
§ School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| Naturalis Biodiversity Center, Leiden, Netherlands
¶ Distributed System of Scientific Collections - DiSSCo, Leiden, Netherlands
# Department of Mathematics and Computer Science, Philipps-University Marburg, Marburg, Germany
¤ Palmengarten der Stadt Frankfurt, Frankfurt am Main, Germany
Open Access

Abstract

International mass digitization efforts through infrastructures like the European Distributed System of Scientific Collections (DiSSCo), the US resource for Digitization of Biodiversity Collections (iDigBio), the National Specimen Information Infrastructure (NSII) of China, and Australia’s digitization of National Research Collections (NRCA Digital) make geo- and biodiversity specimen data freely, fully and directly accessible. 

Complementary, overarching infrastructure initiatives like the European Open Science Cloud (EOSC) were established to enable mutual integration, interoperability and reusability of multidisciplinary data streams including biodiversity, Earth system and life sciences (De Smedt et al. 2020). 

Natural Science Collections (NSC) are of particular importance for such multidisciplinary and internationally linked infrastructures, since they provide hard scientific evidence by allowing direct traceability of derived data (e.g., images, sequences, measurements) to physical specimens and material samples in NSC. 

To open up the large amounts of trait and habitat data and to link these data to digital resources like sequence databases (e.g., ENA), taxonomic infrastructures (e.g., GBIF) or environmental repositories (e.g., PANGAEA), proper annotation of specimen data with rich (meta)data early in the digitization process is required, next to bridging technologies to facilitate the reuse of these data. 

This was addressed in recent studies  (Younis et al. 2018, Younis et al. 2020), where we employed computational image processing and artificial intelligence technologies (Deep Learning) for the classification and extraction of features like organs and morphological traits from digitized collection data (with a focus on herbarium sheets).

However, such applications of artificial intelligence are rarely—this applies both for (sub-symbolic) machine learning and (symbolic) ontology-based annotations—integrated in the workflows of NSC’s management systems, which are the essential repositories for the aforementioned integration of data streams.

This was the motivation for the development of a Deep Learning-based trait extraction and coherent Digital Specimen (DS) annotation service providing “Machine learning as a Service” (MLaaS) with a special focus on interoperability with the core services of DiSSCo, notably the DS Repository (nsidr.org) and the Specimen Data Refinery (Walton et al. 2020), as well as reusability within the data fabric of EOSC.  

Taking up the use case to detect and classify regions of interest (ROI) on herbarium scans, we demonstrate a MLaaS prototype for DiSSCo involving the digital object framework, Cordra, for the management of DS as well as instant annotation of digital objects with extracted trait features (and ROIs) based on the DS specification openDS (Islam et al. 2020).

 Source code available at: https://github.com/jgrieb/plant-detection-service

Keywords

FAIR Digital Object, Distributed System of Scientific Collections, plant organ detection, deep learning, region-based convolutional neural network, image annotation

Presenting author

Jonas Grieb

Presented at

TDWG 2021

Funding program

H2020-INFRADEV-2019-2020 – Grant Agreement No. 871043

Deutsche Forschungsgemeinschaft (DFG) - Project number 316452578

References

login to comment