Augmentation Methods for Biodiversity Training Data

Mario Lasseck

doi:10.3897/biss.3.37307

Biodiversity Information Science and Standards : Conference Abstract

Conference Abstract

Augmentation Methods for Biodiversity Training Data

Mario Lasseck ^‡

‡ Museum für Naturkunde, Berlin, Germany

Corresponding author: Mario Lasseck (mario.lasseck@mfn.berlin)

Received: 14 Jun 2019 | Published: 19 Jun 2019

This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Citation: Lasseck M (2019) Augmentation Methods for Biodiversity Training Data. Biodiversity Information Science and Standards 3: e37307. https://doi.org/10.3897/biss.3.37307

Abstract

The detection and identification of individual species based on images or audio recordings has shown significant performance increase over the last few years, thanks to recent advances in deep learning. Reliable automatic species recognition provides a promising tool for biodiversity monitoring, research and education. Image-based plant identification, for example, now comes close to the most advanced human expertise (Bonnet et al. 2018, Lasseck 2018a). Besides improved machine learning algorithms, neural network architectures, deep learning frameworks and computer hardware, a major reason for the gain in performance is the increasing abundance of biodiversity training data, either from observational networks and data providers like GBIF, Xeno-canto, iNaturalist, etc. or natural history museum collections like the Animal Sound Archive of the Museum für Naturkunde. However, in many cases, this occurrence data is still insufficient for data-intensive deep learning approaches and is often unbalanced, with only few examples for very rare species. To overcome these limitations, data augmentation can be used. This technique synthetically creates more training samples by applying various subtle random manipulations to the original data in a label-preserving way without changing the content. In the talk, we will present augmentation methods for images and audio data. The positive effect on identification performance will be evaluated on different large-scale data sets from recent plant and bird identification (LifeCLEF 2017, 2018) and detection (DCASE 2018) challenges (Lasseck 2017, Lasseck 2018b, Lasseck 2018c).

Keywords

data augmentation, deep learning, species identification, audio & image recognition

Presenting author

Mario Lasseck

Presented at

Biodiversity_Next 2019

Acknowledgements

Funding program

Grant title

Hosting institution

Ethics and security

Author contributions

Conflicts of interest

References

Bonnet P, Goëau H, Hang ST, Lasseck M, Šulc M, Malécot V, Jauzein P, Melet J, You C, Joly A (2018)

Plant Identification: Experts vs. Machines in the Era of Deep Learning

Multimedia Tools and Applications for Environmental & Biodiversity Informatics

131

‑

149

. https://doi.org/10.1007/978-3-319-76445-0_8

Lasseck M (2017)

Image-based Plant Species Identification with Deep Convolutional Neural Networks

Working Notes of CLEF 2017

Lasseck M (2018a)

Machines vs. Human Experts: Contribution to the ExpertLifeCLEF 2018 Plant Identification Task

Working Notes of CLEF 2018

Lasseck M (2018b)

Audio-based Bird Species Identification with Deep Convolutional Neural Networks

Working Notes of CLEF 2018

Lasseck M (2018c)

Acoustic Bird Detection with Deep Convolutional Neural Networks

. In: Plumbley MD, Kroos C, Bello JP, Richard G, Ellis DP, A. M (Eds)

Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop

DCASE 2018

Tampere University of Technology

143-147

pp.

Supplementary material

Endnotes