63urn:lsid:arphahub.com:pub:0E0032F4-55AE-5263-8B3C-F4DD637C30C2Biodiversity Information Science and StandardsBISS2535-0897Pensoft Publishers10.3897/biss.3.373413734111152Conference AbstractSS86 - Machine learning: an emerging toolkit for biodiversity science using museum collectionsAccelerating the Automated Detection, Counting and Measurements of Reproductive Organs in Herbarium Collections in the Era of Deep LearningMora-FallasAdánadamora@ic-itcr.ac.cr1GoëauHervé H.G.herve.goeau@cirad.fr23MazerSusan4LoveNataliehttps://orcid.org/0000-0002-5013-54784Mata-MonteroErickhttps://orcid.org/0000-0001-5471-164XBonnetPierrepierre.bonnet@cirad.fr32JolyAlexis A.J.alexis.joly@inria.fr5School of Computing, Costa Rica Institute of Technology, Cartago, Costa RicaSchool of Computing, Costa Rica Institute of TechnologyCartagoCosta RicaCIRAD, UMR AMAP, Montpellier, FranceCIRAD, UMR AMAPMontpellierFranceAMAP, Univ Montpellier, CIRAD, CNRS, INRA, IRD, Montpellier, FranceAMAP, Univ Montpellier, CIRAD, CNRS, INRA, IRDMontpellierFranceDepartment of Ecology, Evolution, and Marine Biology, University of California Santa Barbara, Santa Barbara, California, United States of AmericaDepartment of Ecology, Evolution, and Marine Biology, University of California Santa BarbaraSanta Barbara, CaliforniaUnited States of AmericaInria, Zenith team, Montpellier, FranceInria, Zenith teamMontpellierFrance
2019260620193e37341671229E7-B599-5277-846B-C60955FBD97A326803714062019Adán Mora-Fallas, Hervé H.G. Goëau, Susan Mazer, Natalie Love, Erick Mata-Montero, Pierre Bonnet, Alexis A.J. JolyThis is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Millions of herbarium records provide an invaluable legacy and knowledge of the spatial and temporal distributions of plants over centuries across all continents (Soltis et al. 2018). Due to recent efforts to digitize and to make publicly accessible most major natural collections, investigations of ecological and evolutionary patterns at unprecedented geographic scales are now possible (Carranza-Rojas et al. 2017, Lorieul et al. 2019). Nevertheless, biologists are now facing the problem of extracting from a huge number of herbarium sheets basic information such as textual descriptions, the numbers of organs, and measurements of various morphological traits. Deep learning technologies can dramatically accelerate the extraction of such basic information by automating the routines of organ identification, counts and measurements, thereby allowing biologists to spend more time on investigations such as phenological or geographic distribution studies.
Recent progress on instance segmentation demonstrated by the Mask-RCNN method is very promising in the context of herbarium sheets, in particular for detecting with high precision different organs of interest on each specimen, including leaves, flowers, and fruits. However, like any deep learning approach, this method requires a significant number of labeled examples with fairly detailed outlines of individual organs. Creating such a training dataset can be very time-consuming and may be discouraging for researchers. We propose in this work to integrate the Mask-RCNN approach within a global system enabling an active learning mechanism (Sener and Savarese 2018) in order to minimize the number of outlines of organs that researchers must manually annotate. The principle is to alternate cycles of manual annotations and training updates of the deep learning model and predictions on the entire collection to process. Then, the challenge of the active learning mechanism is to estimate automatically at each cycle which are the most useful objects that must be manually extracted in the next manual annotation cycle in order to learn, in as few cycles as possible, an accurate model.
We discuss experiments addressing the effectiveness, the limits and the time required of our approach for annotation, in the context of a phenological study of more than 10,000 reproductive organs (buds, flowers, fruits and immature fruits) of Streptanthus tortuosus, a species known to be highly variable in appearance and therefore very difficult to be processed by an instance segmentation deep learning model.
Herbarium collectionphenologyphenophasedeep learninginstance detectionactive learningvisual annotation2019Biodiversity_NextBiodiversity_Next 2019Leiden, The NetherlandsA joint conference by The Global Biodiversity Information Facility (GBIF), a new pan-European Research Infrastructure initiative (DiSSCo), the national resource for digitized information about vouchered natural history collections (iDigBio), Consortium of European Taxonomic Facilities (CETAF), Biodiversity Information Standards (TDWG) and LifeWatch ERIC, the e-Science and Technology European Infrastructure for Biodiversity and Ecosystem Research.Presenting author
Hervé Goëau
Presented at
Biodiversity_Next 2019
ReferencesCarranza-RojasJoseGoeauHerveBonnetPierreMata-MonteroErickJolyAlexis2017Going deeper in the automated identification of Herbarium specimens1718110.1186/s12862-017-1014-zLorieulTitouanPearsonK. D.EllwoodE. R.GoëauHervéMolinoJ. ‐ F.SweeneyP. W.YostJ. M.SachsJoelMata‐MonteroErickNelsonGilSoltisP. S.BonnetPierreJolyAlexis2019Toward a large‐scale and deep phenological stage annotation of herbarium specimens: Case studies from temperate, tropical, and equatorial floras73e0123310.1002/aps3.1233SenerOzanSavareseSilvio2018Active learning for convolutional neural networks: A core-set approach.https://openreview.net/forum?id=H1aIuk-RWSoltisP. S.NelsonGilJamesS. A.2018Green digitization: Online botanical collections data answering real-world questions62e102810.1002/aps3.1028