Proceedings of TDWG : Conference Abstract
Print
Conference Abstract
Using a Deep Convolutional Neural Network for Extracting Morphological Traits from Herbarium Images
expand article infoYue Zhu, Thibaut Durand§, Eric Chenin|, Marc Pignal, Patrick Gallinari§, Régine Vignes-Lebbe#
‡ Ecole Polytechnique, Palaiseau, France
§ UPMC Paris 6 - Sorbonne Universités -- LIP6, Paris, France
| IRD, Paris, France
¶ Institut de Systématique, Évolution, Biodiversité ISYEB - UMR 7205 – CNRS, MNHN, UPMC, EPHE Muséum national d’Histoire naturelle, Sorbonne Universités, Paris, France
# Institut de Systématique, Évolution, Biodiversité ISYEB - UMR 7205 – CNRS, MNHN, UPMC, EPHE UPMC Univ. Paris 06, Sorbonne Universités, Paris, France
Open Access

Abstract

Natural history collection data are now accessible through databases and web portals. However, ecological or morphological traits describing specimens are rarely recorded while gathering data.  This lack limits queries and analyses. Manual tagging of millions of specimens will be a huge task even with the help of citizen science projects such as “les herbonautes” (http://lesherbonautes.mnhn.fr). On the other hand, deep learning methods that use convolutional neural networks (CNN) demonstrate their efficiency in various domains, such as computer vision (Krizhevsky et al. 2012Azizpour et al. 2016), speech recognition (Abdel-Hamid et al. 2014) or face identification (Li et al. 2015Freytag et al. 2016).

We aim to use deep learning to provide a visual representation of words used to describe plants (e.g. simple, or compound leaf), and to associate those words with specimens in the Paris herbarium.  This will provide a semantic description of each of the 7 millions images of the fully digitized collection of the Paris herbarium in the Muséum National d'Histoire Naturelle (MNHN, Paris, France). In a proof of concept project, we have used a CNN - pre-trained on the image database ImageNet (http://www.image-net.org) - in order to identify 4 morphological traits of leaves, using 103,000 herbarium images from 11 different taxa: margin (entire / dentate), leaf attachment (opposite / alternate), leaf organization (simple / compound), plant (woody / non-woody) (see Fig. 1) Seventy percent of images are used to train the weights of the model (in a supervised learning process that uses a training set already tagged for the 4 characteristics), 10% are used as validation set to tune the hyper-parameters of the model and to avoid overfitting, and 20% are used as test set to evaluate the generalization performances of the final neural network. The first results are encouraging with over 80% success on the test set. In a second step, we test if the neural network is not overfitting the training examples, and can generalize to new taxa. If we restrict the training set to a small number of taxa (4 taxa containing 76% of images), the success rate on the 7 other taxa (unseen during training) decreases drastically. A good sampling of the taxonomic diversity of plants appears crucial to train the neural network. A second method visualizes the area of each image that was detected by the CNN as the most important for morphological character recognition (Durand et al. 2017). This method provides an explanatory view of the automatic recognition process. In this poster we describe methods and results on botanical images for the different taxa. We discuss perspectives on image tagging of the Paris herbarium, and how to combine the citizen science project in order to annotate images with CNN automatic image description.

Figure 1.

The deep learning process using the convolutionnal neural network ResNet.

Keywords

Deep learning, convolutionnal neural network, herbarium, image, morphology

Presenting author

Régine Vignes-Lebbe

References

login to comment