Biodiversity Information Science and Standards : Conference Abstract
PDF
Conference Abstract
PyDwCA: A Tool for Integrating Biodiversity Data
expand article infoJuan M. Sáez-Hidalgo, Ricardo A. Segovia‡,§, Francisco A. Squeo‡,|, Pablo C. Guerrero‡,§,
‡ Institute of Ecology and Biodiversity, Concepción, Chile
§ Departamento de Botánica, Universidad de Concepción, Concepción, Chile
| Universidad de La Serena, La Serena, Chile
¶ Millennium Institute Biodiversity of Antarctic and Sub-Antarctic Ecosystems, Santiago, Chile
Open Access

Abstract

The Darwin Core Archive (DwC-A) format, based on the Darwin Core standard (Wieczorek et al. 2012), facilitates the exchange, management, and integration of biodiversity data from multiple sources. This ability to collate biodiversity data allows datasets to be aggregated at community-supported infrastructures, merged in different combinations, meta-analyzed and submitted to public repositories (Baker et al. 2014). Thus, the DwC-As serve as unifying archives in concatenated collective efforts, such as biodiversity inventories at different spatial and taxonomic scales.

Here we describe PyDwCA*1, 2, a Python library implemented to handle the "star scheme" of DwC-A. This new library reads compressed zip files containing the expected meta.xml and uses it to assign the core component and its extensions. It also provides Python classes to define the core, the extensions, and the metadata file for creating an archive and writing it into a compressed zip file. PyDwCA also implements functionality to select, filter and merge DwC-A files.

We present this new tool in the context of the construction of the Chilean National Biodiversity Inventory (Fig. 1), but PyDwCA serves as a versatile technical solution applicable to different contexts in the field of biodiversity informatics (e.g., integration of datasets from biological collection and sampling events). To exemplify how PyDwCA works, we present the step-by-step integration of the Chilean Catalogue of Vascular Plants (Rodriguez et al. 2018) on a matrix provided by the Catalogue of Life (Banki 2024), filtered with the species with occurrences recorded for Chile in the Global Biodiversity Information Facility (GBIF) (GBIF.Org 2023).

Figure 1.

Data pipeline for the generation of the Chilean National Biodiversity Inventory. A) Acquisition of the species presented in Chilean territory using the GBIF data platform. B) Download the DwC-A of the Catalogue of Life, filtering the species using the list obtained by GBIF and the PyDwCA library. C) Exclusion of species of Tracheophyta using the package. D) Generation of the DwC-A of the Catalogue of Vascular Plants of Chile using the Python library presented. This contains a curated list of the species of Tracheophyta in Chile. E) Merging of both DwC-A to get the first version of the Chilean National Biodiversity Inventory.

Keywords

Darwin Core Archive, Python, taxonomic inventory

Presenting author

Juan M. Sáez-Hidalgo

Presented at

SPNHC-TDWG 2024

Funding program

Centros Científicos y Tecnológicos de Excelencia con Financiamiento Basal; Institutos MILENIO

Grant title

Centro Basal Instituto de Ecología y Biodiversidad (ANID FB210006); Milenio BASE (grant ICN2021_002)

Hosting institution

Instituto de Ecología y Biodiversidad

Conflicts of interest

The authors have declared that no competing interests exist.

References

Endnotes
*1

PyDwCA library main page https://pypi.org/project/pydwca/

*2

PyDwCA GitHub repository https://github.com/IEB-BIODATA/pydwca

login to comment