Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: William N Weaver (Willwe@umich.EDU)
Received: 20 Sep 2023 | Published: 21 Sep 2023
© 2023 William Weaver, Kyle Lough, Stephen Smith, Brad Ruhfel
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Weaver WN, Lough K, Smith SA, Ruhfel B (2023) The Future of Natural History Transcription: Navigating AI advancements with VoucherVision and the Specimen Label Transcription Project (SLTP). Biodiversity Information Science and Standards 7: e113067. https://doi.org/10.3897/biss.7.113067
|
|
Natural history collections are critical reservoirs of biodiversity information but collections staff are constantly grappling with substantial backlogs and limited resources. The task of transcribing specimen label text into searchable databases requires a significant amount of time, manual labor, and funding. To address this challenge, we introduce VoucherVision, a tool harnessing the capabilities of several Large Language Models (LLMs;
Integration of VoucherVision with the University of Michigan Herbarium’s transcription workflow resulted in a significant reduction in per-image transcription time, suggesting significant potential advantages for collections workflows. VoucherVision offers promising strides towards efficient digitization, with curatorial staff playing critical roles in data quality assurance and process oversight. Emphasizing the importance of knowledge sharing, the University of Michigan Herbarium is backing the Specimen Label Transcription Project (SLTP), which will provide open access to benchmarking datasets, fine-tuned models, and validation tools to rank the performance of different methodologies, LLMs, and prompting strategies. In the rapidly evolving landscape of Artificial Intelligence (AI) development, we recognize the profound potential of diverse contributions and innovative methodologies to redefine and advance the transformation of curatorial practices, catalyzing an era of accelerated digitization in natural history collections.
An early, public version of VoucherVision is available to try here: https://vouchervision.azurewebsites.net/
large language models, herbarium, specimen digitization, natural language processing
William Weaver
TDWG 2023
We thank the University of Michigan Herbarium for providing specimens, labels, and transcription data.