Biodiversity Information Science and Standards : Conference Abstract
|
Corresponding author: Mathias Dillen (mathias.dillen@plantentuinmeise.be)
Received: 11 Jun 2019 | Published: 18 Jun 2019
© 2019 Mathias Dillen, Quentin Groom, Sarah Phillips, Irena Spasic
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation: Dillen M, Groom Q, Phillips S, Spasic I (2019) Next Steps in Data Capture from Specimen Labels and Data Integration: Lessons learnt from the ICEDIG pilots. Biodiversity Information Science and Standards 3: e37081. https://doi.org/10.3897/biss.3.37081
|
The rapid development and refinement of digital technologies in the last two decades has spearheaded a wave of digitization in natural history collections. This has generated a massive number of digitized images and many more are expected with the planned European Distributed Systems of Scientific Collections (DiSSCo) infrastructure. Many of these images will contain labels with data written on them, typed on them or interpretable from them, but capturing these data remains a challenge. Automated as well as manual methods are being investigated and have yielded mixed results. In addition, previously captured data or data to be captured this way will need to be interoperable in order to make digital access and enrichment most effective. Finally, institutions holding the physical specimens will need to remain capable of efficiently curating the digital, potentially annotated, counterparts. This will require compatibility with the diverse data models of local Collection Management Systems (CMS).
In the context of the ICEDIG (Innovation and consolidation for large scale digitisation of natural heritage) project, a benchmark dataset of herbarium specimens was assembled from nine contributing institutions (
The benchmark dataset was also processed through multiple crowdsourcing platforms, after which the quality and interoperability of the resulting transcriptions was analyzed. The aptitude of local Collection Management Systems to curate these digitized specimens efficiently was investigated, as well as the fitness of data standards in use to ensure and maintain proper interoperability. In addition, available surveys on CMS use and satisfaction were summarized and in-depth assessments of the CMS in use at the ICEDIG partner institutes were performed. A summary of results and recommendations will be presented.
automation, machine learning, data capture, interoperability, collection management system
Mathias Dillen
Biodiversity_Next 2019
Horizon 2020 Framework Programme of the European Union
ICEDIG – “Innovation and consolidation for large scale digitisation of natural heritage” H2020-INFRADEV-2016-2017 – Grant Agreement No. 777483