From Shells in House Cabinets to Structured Data for Research: The mobilization of frozen biodiversity data in Italy
expand article infoArianna Giannini, Marco Oliverio
‡ Department of Biology and Biotechnology “Charles Darwin”, La Sapienza University, Rome, Italy
In recent decades, technological development has accelerated exponentially, and with it the volume of data that can be accumulated and processed (Runting et al. 2020). The big data revolution has enabled great steps forward in natural sciences, allowing the study of global changes at different scales (Nelson and Ellis 2018). Today, biodiversity research has focused more on data quantity than quality, leading to a shift in the collecting methods of primary biodiversity data from specimen-based to observation-based. Some authors argued that the increasing disconnection of occurrence data from actual specimens has some aspects of suboptimality that cannot be ignored, despite also having many benefits (Troudet et al. 2018). In this context, Natural History Collections (NHCs) contain data of potential high quality when specimens are collected and identified by experts; however, most NHCs' data are not databased, records must be digitized to become usable by researchers and other stakeholders, and not all owners have the tools to do so (Fig. 1). In Italy—as in other countries—many specimens of invertebrates are stored in private collections, the majority not databased, and even when they are digitized, they rarely follow international standards, such as Darwin Core - DwC (Darwin Core Task Group 2009). We call this type of data frozen. The production of an accessible nationwide database derived from the digitization of these records could significantly support research and national conservation strategies. This project aims to support the databasing of private collections in Italy and collect their records in one structured geo- and chrono-referenced database of biodiversity data in line with international standards. We have chosen marine molluscs as a pilot taxon, based on three criteria: 1) existence of an updated checklist of the Italian fauna (Renda et al. 2022); 2) existence of an updated taxonomic reference to serve as a thesaurus for the database, namely MolluscaBase (MolluscaBase eds. 2022) and the World Register of Marine Species - WoRMS (WoRMS Editorial Board 2022); 3) management and conservation relevance of the taxon, based on classic criteria for selecting indicator taxa (e.g., Pearson 1994). For data collection, we built an empty template Excel spreadsheet, for ease of use by the terminal operator. The template file contains 21 fields, summarized in Fig. 2, and it is accompanied by other support files (Fig. 3). As of 01 Jul 2022, we had contacted only a small number of specialists, collecting >9500 records. While data are collected from different collections, records will be reorganized into a single database according to the DwC standard. Each record will then be georeferenced following Zermoglio et al. (2020)’s protocol and it will be traceable through a system of Persistent Identifiers. By this project, we aim to foster the mobilization of frozen biodiversity data through a process of digitization and integration of different sources. We expect to produce a database containing a large number of records in a few years, making it available for research and biodiversity management.

Figure 1.

Features of four different sources of occurrence data: 1) public collections, 2) private collections, 3) structured citizen science projects (i.e., projects where occurrences are combined into a single database), and 4) other observation data (e.g., scattered data from online sources).

Figure 2.

The 21 fields (= categories) of the template file, which contain the information requested from specialists. Each category is associated with a DwC class for reference. Note that fields do not always match the DwC terms, since the file is only used to collect data.

Figure 3.

The three files used for data collection.


