Biodiversity Information Science and Standards : Conference Abstract
|
Corresponding author: Matthew Collins (mcollins@acis.ufl.edu)
Received: 11 Apr 2018 | Published: 03 Jul 2018
This is an open access article distributed under the terms of the CC0 Public Domain Dedication.
Citation: Collins M, Yeole G, Frandsen P, Dikow R, Orli S, Figueiredo R (2018) A Pipeline for Deep Learning with Specimen Images in iDigBio - Applying and Generalizing an Examination of Mercury Use in Preparing Herbarium Specimens. Biodiversity Information Science and Standards 2: e25699. https://doi.org/10.3897/biss.2.25699
|
|
iDigBio
Using the GUODA (Global Unified Open Data Access) infrastructure, we have built a model pipeline for applying user-defined processing to any subset of the images stored in iDigBio. This pipeline is run on servers located in the Advanced Computing and Information Systems lab (ACIS) alongside the iDigBio storage system. We use Apache Spark, the Hadoop File System (HDFS), and Mesos to perform the processing. We have placed a Jupyter notebook server in front of this architecture which provides an easy environment with deep learning libraries for Python already loaded for end users to write their own models. Users can access the stored data and images and manipulate them according to their requirements and make their work publicly available on GitHub.
As an example of how this pipeline can be used in research, we applied a neural network developed at the Smithsonian Institution to identify herbarium sheets that were prepared with hazardous mercury containing solutions
iDigBio, deep learning, image, Spark
Matthew Collins
Biodiversity Information Standards (TDWG) 2018, Dunedin, NZ