Biodiversity Information Science and Standards : Conference Abstract
Print
Conference Abstract
Use of European Open Science Cloud and National e-Infrastructures for the Long-Term Storage of Digitised Assets from Natural History Collections
expand article infoAbraham Nieva de la Hidalga, Nicolas Cazenave§, Donat Agosti|, Zhengzhe Wu, Mathias Dillen#, Lars H Nielsen¤
‡ Cardiff University School of Computer Science and Informatics, Cardiff, United Kingdom
§ CINES, Montpellier, France
| www.plazi.org, Bern, Switzerland
¶ University of Helsinki, Helsinki, Finland
# Meise Botanic Garden, Meise, Belgium
¤ CERN/Zenodo, Geneve, Switzerland
Open Access

Abstract

Digitisation of Natural History Collections (NHC) has evolved from transcription of specimen catalogues in databases to web portals providing access to data, digital images, and 3D models of specimens. These portals increase global accessibility to specimens and help preserve the physical specimens by reducing their handling. The size of the NHC requires developing high-throughput digitisation workflows, as well as research into novel acquisition systems, image standardisation, curation, preservation, and publishing. Nowadays, herbarium sheet digitisation workflows (and fast digitisation stations) can digitise up to 6,000 specimens per day. Operating those digitisation stations in parallel, can increase the digitisation capacity. The high-resolution images obtained from these specimens, and their volume require substantial bandwidth, and disk space and tapes for storage of original digitised materials, as well as availability of computational processing resources for generating derivatives, information extraction, and publishing. While large institutions have dedicated digitisation teams that manage the whole workflow from acquisition to publishing, other institutions cannot dedicate resources to support all digitisation activities, in particular long-term storage. National and European e-infrastructures can provide an alternative solution by supporting different parts of the digitisation workflows. In the context of the Innovation and consolidation for large scale digitisation of natural heritage (ICEDIG Project 2018), three different e-infrastructures providing long-term storage have been analysed through three pilot studies: EUDAT-CINES, Zenodo, and National Infrastructures.

The EUDAT-CINES pilot centred on transferring large digitised herbarium collections from the National Museum of Natural History France (MNHN) to the storage infrastructure provided by the Centre Informatique National de l’Enseignement Supérieur (CINES 2014), a European trusted digital repository. The upload, processing, and access services are supported by a combination of services provided by the European Collaborative Data Infrastructure (EUDAT CDI 2019) and CINES . The Zenodo pilot included the upload of herbarium collections from Meise Botanic Garden (APM) and other European herbaria into the Zenodo repository (Zenodo 2019). The upload, processing and access services are supported by Zenodo services, accessed by APM. The National Infrastructures pilot facilitated the upload of digital assets derived from specimens of herbarium and entomology collections held at the Finnish Museum of Natural History (LUOMUS) into the Finnish Biodiversity Information Facility (FinBIF 2019). This pilot concentrates on simplifying the integration of digitisation facilities to Finnish national e-infrastructures, using services developed by LUOMUS to access FinBIF resources.

The data models employed in the pilots allow defining data schemas according to the types of collection and specimen images stored. For EUDAT-CINES, data were composed of the specimen data and its business metadata (those the institution making the deposit, in this case MNHN, considers relevant for the data objects being stored), enhanced by archiving metadata, added during the archiving process (institution, licensing, identifiers, project, archiving date, etc). EUDAT uses ePIC identifiers (ePIC 2019) to identify each deposit. The Zenodo pilot was designed to allow defining specimen data and metadata supporting indexing and access to resources. Zenodo uses DataCite Digital Object Identifiers (DOI) and the underlying data types as the main identifiers for the resources, augmented with fields based on standard TDWG vocabularies. FinBIF compiles Finnish biodiversity information to one single service for open access sharing. In FinBIF, HTTP URI based identifiers are used for all data, which link the specimen data with other information, such as images.

The pilot infrastructure design reports describe features, capacities, functions and costs for each model, in three specific contexts are relevant for the implementation of the Distributed Systems of Scientific Collections (DiSSCo 2019) research infrastructure, informing the options for long-term storage and archiving digitised specimen data. The explored options allow preservation of assets and support easy access. In a wider context, the results provide a template for service evaluation in the European Open Science Cloud (EOSC 2019) which can guide similar efforts. 

Keywords

European Open Science Cloud, e-infrastructure, long-term storage, federated cloud, digitisation, natural history collections, ICEDIG, EUDAT, Zenodo

Presenting author

Abraham Nieva de la Hidalga

Presented at

Biodiversity_Next 2019

Funding program

Horizon 2020 Framework Programme of the European Union

Grant title

ICEDIG – “Innovation and consolidation for large scale digitisation of natural heritage” H2020-INFRADEV-2016-2017 – Grant Agreement No. 777483

Hosting institution

EUDAT, CINES, Zenodo, Plazi, Botanic Gardens Meise, University of Finland

References