Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: Vincent Stuart Smith (v.smith@nhm.ac.uk)
Received: 14 Sep 2023 | Published: 15 Sep 2023
© 2023 Arianna Salili-James, Ben Scott, Laurence Livermore, Ben Price, Steen Dupont, Helen Hardy, Vincent Smith
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Salili-James A, Scott B, Livermore L, Price B, Dupont S, Hardy H, Smith VS (2023) AI-Accelerated Digitisation of Insect Collections: The next generation of Angled Label Image Capture Equipment (ALICE). Biodiversity Information Science and Standards 7: e112742. https://doi.org/10.3897/biss.7.112742
|
|
The digitisation of natural science specimens is a shared ambition of many of the largest collections, but the scale of these collections, estimated at at least 1.1 billion specimens (
The Natural History Museum, London (NHM) has been pioneering work to accelerate the digitisation of its 80 million specimens. Since the inception of the NHM Digital Collection Programme in 2014, more than 5.5 million specimen records have been made digitally accessible. This has enabled the museum to deliver a tenfold increase in digitisation, compared to when rates were first measured by the NHM in 2008. Even with this investment, it will take circa 150 years to digitise its remaining collections, leading the museum to pursue technology-led solutions alongside increased funding to deliver the next increase in digitisation rate.
Insects comprise approximately half of all described species and, at the NHM, represent more than one-third (c. 30 million specimens) of the NHM’s overall collection. Their most common preservation method, attached to a pin alongside a series of labels with metadata, makes insect specimens challenging to digitise. Early Artificial Intelligence (AI)-led innovations (
With the ALICE setup, a user might average imaging 800 digitised specimens per day, and exceptionally, up to 1,300. This compares with an average of 250 specimens or fewer daily, using more traditional methods involving separating the labels and photographing them off of the pin. Despite this, our original version of ALICE was only suited to a small subset of the collection. In situations when the specimen is very large, there are too many labels, or these labels are too close together, ALICE fails (
Using a combination of updated AI processing tools, we hereby present ALICE version 2. This new version of ALICE provides faster rates, improved software accuracy, and a more streamlined pipeline. It includes the following updates:
Hardware: after conducting various tests, we have optimised the camera setup. Further hardware updates include a Light-Emitting Diode (LED) ring light, as well as modifications to the camera mounting.
Software: our latest software incorporates machine learning and other computer vision tools to segment labels from ALICE images and stitch them together more quickly and with a higher level of accuracy, significantly reducing the image processing failure rate. These processed label images can be combined with the latest OCR tools for automatic transcription and data segmentation.
Buildkit: we aim to provide a toolkit that any individual or institution can incorporate into their digitisation pipeline. This includes hardware instructions, an extensive guide detailing the pipeline, and new software code accessible via Github.
We provide test data and workflows to demonstrate the potential of ALICE version 2 as an effective, accessible, and cost-saving solution to digitising pinned insect specimens. We also describe potential modifications, enabling it to work with other types of specimens.
entomology, biodiversity, data, machine learning, computer vision
Vincent Stuart Smith
TDWG 2023