Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: Youcef Sklab (youcef.sklab@ird.fr)
Received: 27 Aug 2024 | Published: 28 Aug 2024
© 2024 Youcef Sklab, Hanane Ariouat, Youssef Boujydah, Yassine Qacami, Edi Prifti, Jean-daniel Zucker, Régine Vignes Lebbe, Eric Chenin
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Sklab Y, Ariouat H, Boujydah Y, Qacami Y, Prifti E, Zucker J-daniel, Vignes Lebbe R, Chenin E (2024) Towards a Deep Learning-Powered Herbarium Image Analysis Platform. Biodiversity Information Science and Standards 8: e135629. https://doi.org/10.3897/biss.8.135629
|
Global digitization efforts have archived millions of specimen scans worldwide in herbarium collections, which are essential for studying plant evolution and biodiversity. ReColNat hosts, at present, over 10 million images. However, analyzing these datasets poses crucial challenges for botanical research. The application of deep learning in biodiversity analyses, particularly in analyzing herbarium scans, has shown promising results across numerous tasks (
Within the e-Col+project (ANR-21-ESRE-0053), we are developing multiple deep learning models aimed at identifying plant morphological traits. We have developed pipelines and models for cleaning, analyzing, and transforming herbarium images, including models for: i) detecting non-vegetal elements, such as barcodes, envelopes, labels, etc.; ii) detecting plant organs, including leaves, flowers, fruits, etc.; and iii) segmenting to recognize plant parts for image cleaning. We are also developing models for classification tasks related to various morphological traits.
To validate these models, improve their generalization, and make them easily usable by end-users, deploying them within a generic platform is crucial. The generic platform called PlantAI, currently under development by the e-Col+ project, should enable easy deployment during development for testing and allow users to load annotations for new traits in order to train a model and add it to the existing catalog. The platform is based on a microservice architecture, allowing users to upload images, create custom datasets, and access various AI models for image analysis.
The platform is composed of four main modules, as illustrated in Fig.
The third module is the dataset manager, which handles metadata and annotations associated with the specimens. These annotations can be produced either by expert users or by AI models. The fourth module is the AI models management module, so that models can be used to generate AI annotations of specimen. During the development lifecycle of an AI model, users can create datasets and annotate them with AI models. These annotations can be in two possible states: validated by experts and non-validated. Users collaborating on a project can indicate errors in the model predictions and leave comments to explain their evaluations. These corrections made by experts can be used to retrain the models and thus improve their performance.
This platform, will be highly beneficial for botanists, enhancing the efficiency and effectiveness of biodiversity analyses from herbarium scans. We aim to provide users with a catalog of AI models through this platform and allow them to import their own datasets with their own annotations regarding traits of their choice. Users will be able to select a model from the AI model catalog and train it using their dataset. Ultimately, the model obtained from this training will be automatically deployed to be available for AI annotation. The annotations produced by this model will be automatically available in the filtering and navigation interface, thus allowing for dynamic and automatic integration of the AI annotations into the navigation interface.
herbarium scans, AI annotation, navigation interface
Youcef Sklab