Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: Hanane Ariouat (hanane.ariouat@ird.fr)
Received: 04 Sep 2023 | Published: 06 Sep 2023
© 2023 Hanane Ariouat, Youcef Sklab, Marc Pignal, Régine Vignes Lebbe, Jean-Daniel Zucker, Edi Prifti, Eric Chenin
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Ariouat H, Sklab Y, Pignal M, Vignes Lebbe R, Zucker J-D, Prifti E, Chenin E (2023) Extracting Masks from Herbarium Specimen Images Based on Object Detection and Image Segmentation Techniques. Biodiversity Information Science and Standards 7: e112161. https://doi.org/10.3897/biss.7.112161
|
|
Herbarium specimen scans constitute a valuable source of raw data. Herbarium collections are gaining interest in the scientific community as their exploration can lead to understanding serious threats to biodiversity. Data derived from scanned specimen images can be analyzed to answer important questions such as how plants respond to climate change, how different species respond to biotic and abiotic influences, or what role a species plays within an ecosystem. However, exploiting such large collections is challenging and requires automatic processing. A promising solution lies in the use of computer-based processing techniques, such as Deep Learning (DL). But herbarium specimens can be difficult to process and analyze as they contain several kinds of visual noise, including information labels, scale bars, color palettes, envelopes containing seeds or other organs, collection-specific barcodes, stamps, and other notes that are placed on the mounting sheet. Moreover, the paper on which the specimens are mounted can degrade over time for multiple reasons, and often the paper's color darkens and, in some cases, approaches the color of the plants.
Neural network models are well-suited to the analysis of herbarium specimens, while making abstraction of the presence of such visual noise. However, in some cases the model can focus on these elements, which eventually can lead to a bad generalization when analyzing new data on which these visual elements are not present (
In this work, we aim to create clean high-resolution mask extractions with the same resolution as the original images. These masks can be used by other models for a variety of purposes, for instance to distinguish the different plant organs. Here, we proceed by combining object detection and image segmentation techniques, using a dataset of scanned herbarium specimens. We propose an algorithm that identifies and retains the pixels belonging to the plant specimen, and removes the other pixels that are part of non-plant elements considered as noise. A removed pixel is set to zero (black). Fig.
In the first stage, we manually annotated the images using bounding boxes in a dataset of 950 images. We identified (Fig.
Our approach removes the background noise from herbarium scans and extracts clean plant images. It is an important step before using these images in different deep learning models. However, the quality of the extractions varies depending on the quality of the scans, the condition of the specimens, and the paper used. For example, extractions made from samples where the color of the plant is different from the color of the background were more accurate than extractions made from samples where the color of the plant and background are close. To overcome this limitation, we aim to use some of the obtained extractions to create a training dataset, followed by the development and the training of a generative deep learning model to generate masks that delimit plants.
image noise, deep learning, herbarium mounting sheet
Youcef Sklab
TDWG 2023