Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: Elie M. Saliba (elie.saliba@mnhn.fr)
Received: 15 Aug 2024 | Published: 16 Aug 2024
© 2024 Elie Saliba, Eric Chenin, Régine Vignes Lebbe
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Saliba EM, Chenin E, Vignes Lebbe R (2024) Traits for Efficient Navigation and Search in Natural History Collections. Biodiversity Information Science and Standards 8: e134816. https://doi.org/10.3897/biss.8.134816
|
The application of AI methods is increasingly fueling research in biodiversity. One of the objectives of the French national e-COL+ project is to enable collections to benefit from the innovative contributions of image recognition and text mining.
The preceding e-ReColNat project aimed to centralize all the images and data from natural history collections on a single platform (
To go further, it is essential to identify some potential traits that AI models can be trained to recognize. To this end, scientists and curators with expertise in different taxa and conservation techniques were consulted. The taxonomic knowledge of the interviewees covers botany, zoology and paleontology. Their expertise encompasses different types of collections, such as fossils, thin sections, herbarium sheets, alcohol-preserved and dry specimens (Table
Table 1: Taxa and corresponding categories covered by the interviews for the e-Col+ trait project
Taxon |
Type of collection |
Botany |
|
Angiosperma |
Herbarium sheets |
Filicophyta |
Herbarium sheets |
Paleozoology |
|
Archaeocyatha |
Thin sections, fossils |
Ammonita |
Fossils |
Zoology |
|
Mammalia |
Skeletons |
Pisces |
Skeletons, alcohol-preserved specimens |
Amphibia |
Alcohol-preserved specimens |
Aves |
Skeletons, dry specimens |
Some of the traits mentioned are specific to individual specimens, including visible polymorphic morpho-anatomical characteristics, such as the shape of a leaf. Another possible category of traits is related to the specific preservation state of the specimen, such as early traces of pyrite rot (see
Other traits can be deduced from species-level descriptions. These include broader characteristics than those mentioned above, such as invisible morpho-anatomy at the level of the specimen, such as the potential size of a tree. The ecology, phenology, spatial distribution and relationships with humans were also cited. Natural language processing (NLP) artificial intelligence techniques are used to extract these traits (
The main issue that emerged during the interviews was the vocabulary. As an example, the notions of ‘toothed’ or ‘denticulate’ to describe a leaf margin are difficult to strictly differentiate. Moreover, some collections at the Muséum national d'Histoire naturelle (MNHN) need an upstream improvement of their current metadata (missing or weak taxonomic identification, database populating in progress), before AI-derived data can be implemented effectively.
In conclusion, by systematically identifying and extracting traits relevant to navigation and search from a vast array of images, the e-Col+ project enhances the usability of French collections. Collaboration between scientists, curators and AI experts ensures the robustness and usefulness of the project's outcomes, paving the way for innovative research and application.
artificial intelligence, training set, curation, e-COL+
Elie M. Saliba
SPNHC-TDWG 2024
The authors are grateful to D. Brabant, A. Kerner, A. Ohler, E. Pérez, P. Provini, I. Rouget, T. Bourgoin, F. Jabbour, M. Pignal, P. Pruvost, G. Rouhan (Muséum National d’Histoire Naturelle); C. Loup (Université de Montpellier); and N. Bailly (University of British Columbia) for answering our questions.
This work was funded by the e-COL+ PIA (21-ESRE-0053).