Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: Maya Sahraoui (sahraoui@isir.upmc.fr)
Received: 13 Sep 2023 | Published: 14 Sep 2023
© 2023 Maya Sahraoui, Youcef Sklab, Marc Pignal, Régine Vignes Lebbe, Vincent Guigue
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Sahraoui M, Sklab Y, Pignal M, Vignes Lebbe R, Guigue V (2023) Leveraging Multimodality for Biodiversity Data: Exploring joint representations of species descriptions and specimen images using CLIP. Biodiversity Information Science and Standards 7: e112666. https://doi.org/10.3897/biss.7.112666
|
|
In recent years, the field of biodiversity data analysis has witnessed significant advancements, with a number of models emerging to process and extract valuable insights from various data sources. One notable area of progress lies in the analysis of species descriptions, where structured knowledge extraction techniques have gained prominence. These techniques aim to automatically extract relevant information from unstructured text, such as taxonomic classifications and morphological traits. (
Furthermore, object detection on specimen images has emerged as a powerful tool in biodiversity research. By leveraging computer vision algorithms (
On the other hand, multimodal learning, an emerging field in artificial intelligence (AI), focuses on developing models that can effectively process and learn from multiple modalities, such as text and images (
Structured knowledge extraction from species descriptions and object detection on specimen images synergistically enhances biodiversity data analysis. This integration leverages textual and visual data strengths, gaining deeper insights. Extracted structured information from descriptions improves search, classification, and correlation of biodiversity data. Object detection enriches textual descriptions, providing visual evidence for the verification and validation of species characteristics.
To tackle the challenges posed by the massive volume of specimen images available at the Herbarium of the National Museum of Natural History in Paris, we have chosen to implement the CLIP (Contrastive Language-Image Pretraining) model (
Fine-tuning the CLIP model on our dataset of species descriptions and specimen images is crucial for adapting it to our domain. By exposing the model to our data, we enhance its ability to understand and represent biodiversity characteristics. This involves training the model on our labeled dataset, allowing it to refine its knowledge and adapt to biodiversity patterns.
Using the fine-tuned CLIP model, we aim to develop an efficient search engine for the Herbarium's vast biodiversity collection. Users can query the engine with morphological keywords, and it will match textual descriptions with specimen images to provide relevant results. This research aligns with the current AI trajectory for biodiversity data, paving the way for innovative approaches to address conservation and understanding of our planet's biodiversity.
joint representation learning, image captioning, multimodal named entity recognition, visual grounding
Maya Sahraoui
TDWG 2023