Biodiversity Information Science and Standards : Conference Abstract
Print
Conference Abstract
Semantic Annotation of Botanical Collection Data
expand article infoDominik Röpert, Fabian Reimeier, Jörg Holetschek, Anton Güntsch
‡ Botanic Garden and Botanical Museum Berlin, Berlin, Germany
Open Access

Abstract

Herbarium specimens have been digitized at the Botanical Garden and Botanical Museum, Berlin (BGBM) since the year 2000. As part of the digitization process, specimen data have been recorded manually for specific basic data elements. Additional elements were usually added later based on the digital images.

During the last twenty years, data were transcribed exactly as they were written on the labels, a widely used procedure in European herbaria. This approach led to a large number of orthographic variations especially with regard to person and place names.

To improve interoperability between records within our own collection database and across collection databases provided by the community, we have started to enrich our metadata with Linked Open Data (LOD)-based links to semantic resources starting with collectors and geographic entities. Preferred resources for semantic enrichment (e.g., WikiData, GeoNames) have been agreed on by members of the Consortium of European Taxonomic Facilities (CETAF) in order to exploit the potential of semantically enriched collection data in the best possible way.

To be able to annotate many collection records in a relatively short time, priority was given to concepts (e.g., specific collector names) that occur on many specimen labels and that have an existing and easy-to-find semantic representation in an external resource. With this approach, we were able to annotate 52,000 specimen records in just a few weeks of working time of a student assistant.

The integration of our semantic annotation workflows with other data integration, cleaning, and import processes at the BGBM  is carried out using an OpenRefine-based platform with specific extensions for services and functions related to label transcription activities (Kirchhoff et al. 2018).

Our semantically enriched collection data will contribute to a “Botany Pilot,” which is presently being developed by member organizations of CETAF to demonstrate the potential of Linked Open Collection Data and their integration with existing semantic resources.

Keywords

Linked Open Data, LOD, semantic web, Wikidata

Presenting author

Dominik Röpert

Presented at

Biodiversity_Next 2019

References

login to comment