Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: Alexis A.J. Joly (alexis.joly@inria.fr)
Received: 31 Aug 2021 | Published: 31 Aug 2021
© 2021 Hervé Goëau, Pierre Bonnet, Alexis Joly
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Goëau HH.G, Bonnet P, Joly AA.J (2021) AI-based Identification of Plant Photographs from Herbarium Specimens. Biodiversity Information Science and Standards 5: e73751. https://doi.org/10.3897/biss.5.73751
|
Automated plant identification has recently improved significantly due to advances in deep learning and the availability of large amounts of field photos. As an illustration, the classification accuracy of 10K species measured in the LifeCLEF challenge (
Nevertheless, for several centuries, botanists have systematically collected, catalogued, and stored plant specimens in herbaria. Considerable recent efforts by the biodiversity informatics community, such as DiSSCo (
A herbarium sheet (left) and a field photo (right) of the same individual plant (Unonopsis stipitata Diels).
To advance research on this topic, we built a large dataset that we shared as one of the challenges of the LifeCLEF 2020 (
Based on this dataset, about ten research teams have developed deep learning methods to address the challenge (including the authors of this abstract as the organizing team). A detailed description of these methods can be found in the technical notes written by the participating teams (
The domain adaptation methods themselves were of two types, those based on
In Table
MRR | MRR on most difficult species | |
Best classical CNN |
0.011 |
0.004 |
Best classical CNN with additional training data | 0.039 | 0.007 |
Best domain adaptation method based on metric learning | 0.121 | 0.107 |
Best domain adaptation method based on adversarial regularization | 0.180 | 0.052 |
Classical deep learning models fail to identify plant photos from digitized herbarium specimens. The best classical CNN trained on the provided data resulted in a very low MRR score (0.011). Even with the of use additional training data (e.g. photos and digitized herbarium from GBIF) the MRR score remains very low (0.039).
Domain adaptation methods provide significant improvement but the task remains challenging. The best MRR score (0.180) was achieved by using adversarial regularization (FSDA
No method fits all. As shown in Table
In 2021, the challenge was run again but with additional information provided to train the models, i.e., species traits (plant life form, woodiness and plant growth form). The use of the species traits allowed slight performance improvement of the best adversarial adaptation method (with a MRR equal to 0.198).
In conclusion, the results of the experiments conducted are promising and demonstrate the potential interest of digitized herbarium data for automated plant identification. However, progress is still needed before integrating this type of approach into production applications.
plant, identification, photos, herbarium, domain adaptation
Hervé Goëau
TDWG 2021