Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: Carlos A. Martínez-Muñoz (archilegt@gmail.com)
Received: 18 Aug 2022 | Published: 23 Aug 2022
© 2022 Carlos Martínez-Muñoz, Dorothee Huff, Marie Meister, Christine Driller
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Martínez-Muñoz CA, Huff D, Meister M, Driller C (2022) Mobilizing and Enhancing Legacy Biodiversity Data: The case of Karl Wilhelm Verhoeff's correspondence. Biodiversity Information Science and Standards 6: e93679. https://doi.org/10.3897/biss.6.93679
|
|
A considerable amount of biological data is preserved as physical documents, the legacy of former explorers, collectors, researchers, and others. Mobilizing data from handwritten documents has been considered particularly challenging, with well-known cases such as the manual transcription of specimen labels and herbarium sheets by museum staff, or crowdsourced transcription of data card collections through online platforms.
Here we present a pipeline of open-source software that can be used to
We based our use case on the correspondence of the German zoologist Karl Wilhelm Verhoeff, related to the Myriapoda collection held at the Musée Zoologique de Strasbourg.
The documents were processed with Transkribus (
As a next step we are planning to subject the corrected text from Transkribus to a specific text-preprocessing workflow combining natural language processing (NLP) and machine learning (ML) techniques (
We recommend our comprehensive approach to natural history institutions seeking to efficiently digitize and mobilize the rich biological data present in their archival documents.
biodiversity informatics, Chilopoda, Diplopoda, handwritten text recognition
Carlos A. Martínez-Muñoz
TDWG 2022
Deutsche Forschungsgemeinschaft (DFG) - Project number 326061700, Ministerium für Wissenschaft, Forschung und Kunst (MWK) Baden Würtenberg - Project OCR-BW (2019-2022)