Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: Omiros Pantazis (omiros.pantazis.16@ucl.ac.uk)
Received: 06 Sep 2021 | Published: 07 Sep 2021
© 2021 Omiros Pantazis, Gabriel Brostow, Kate Jones, Oisin Mac Aodha
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Pantazis O, Brostow GJ, Jones K, Mac Aodha O (2021) Reducing Manual Supervision Required for Biodiversity Monitoring with Self-Supervised Learning. Biodiversity Information Science and Standards 5: e74047. https://doi.org/10.3897/biss.5.74047
|
Recent years have ushered in a vast array of different types of low cost and reliable sensors that are capable of capturing large quantities of audio and visual information from the natural world. In the case of biodiversity monitoring, camera traps (i.e. remote cameras that take images when movement is detected (
Until recently, this review process was an extremely time consuming endeavor. It required domain experts to manually inspect each image to:
The effectiveness of deep neural networks (
However, camera trap images exhibit unique challenges that are typically not present in standard benchmark datasets used in computer vision. For example, objects of interest are often heavily occluded, the appearance of a scene can change dramatically over time due to changes in weather and lighting, and while the overall number of images can be large, the variation in locations is often limited (
Self-supervised learning is a paradigm in machine learning that attempts to forgo the need for manual supervision by instead learning informative representations from images directly, e.g. transforming an image in two different ways without impacting the semantics of the included object, and learn by imposing similarity between the two tranformations. This is a tantalizing proposition for camera trap data, as it has the potential to drastically reduce the amount of time required to annotate data. The current performance of these methods on standard computer vision benchmarks is encouraging, as it suggests that self-supervised models have begun to reach the accuracy of their fully supervised counterparts for tasks like classifying everyday objects in images (
To this end, we explore the effectiveness of self-supervised learning when applied to camera trap imagery. We show that these methods can be used to train image classifiers with a significant reduction in manual supervision. Furthermore, we extend this analysis by showing, with some careful design considerations, that off-the-shelf self-supervised methods can be made to learn even more effective image representations for automated species classification. We show that by exploiting cues at training time related to where and when a given image was captured can result in further improvements in classification performance. We demonstrate, across several different camera trapping datasets, that it is possible to achieve similar, and sometimes even superior, accuracy to fully supervised transfer learning-based methods using a factor of ten times less manual supervision. Finally, we discuss some of the limitations of the outlined approaches and their implications on automated species classification from images.
computer vision, deep learning, camera traps
Omiros Pantazis
TDWG 2021