Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: Damiano Oldoni (damiano.oldoni@inbo.be)
Received: 30 Sep 2020 | Published: 30 Sep 2020
© 2020 Damiano Oldoni, Quentin Groom, Peter Desmet
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Oldoni D, Groom Q, Desmet P (2020) Occurrence Cubes: A new way of aggregating heterogeneous species occurrence data. Biodiversity Information Science and Standards 4: e59154. https://doi.org/10.3897/biss.4.59154
|
The digital era has brought about an impressive increase in the volume of published species occurrence data. Research infrastructures such as the Global Biodiversity Information Facility (GBIF), the digitization of legacy data, and the use of mobile applications have all played a role in this transition. More data implies, unavoidably, more heterogeneity at multiple levels as a result of the different methods and standards used to collect data. Data standardization and aggregation help to reduce this heterogeneity. Furthermore, intermediate data products that can be used for activities such as mapping, modeling and monitoring improve the repeatability and reproducibility of biodiversity research (
Occurrences can be defined as events in a three-dimensional space where the dimensions are taxonomic (what), temporal (when) and spatial (where). They are then aggregated into what we coined occurrence cube (Fig.
Occurrence cubes (centre) are the result of aggregating species occurrence data from different sources and at different scales by taxon, time and space. It can be used as a data product for maps, models, indicators or networks.
The taxonomic dimension is categorical. Research infrastructures like GBIF use a taxonomic backbone, thus making data aggregation at species level or higher rank relatively easy. The temporal dimension is a continuum and the temporal uncertainty is usually lower than the typical aggregation span, typically a year. Regarding the spatial dimension, occurrences are typically filtered to remove those with too large an uncertainty to fit the grid scheme being used. Meaning that the spatial uncertainty is largely unused. We developed a method to take into account this spatial uncertainty while aggregating data. In particular, we state that an occurrence is spatially representable as a closed plane figure such as a circle, hexagon or square, never as the geometric centre (centroid) of it. As for GBIF occurrence data, the coordinateUncertaintyInMeters is defined as the radius describing the smallest circle containing the whole of the location (see Darwin Core standard). So, spatially speaking, we refer to occurrences as circles, even if the method described below is general.
After harvesting the occurrence data and providing a data quality assessment (e.g. removing occurrences without coordinates or with suspicious coordinates) we can assign occurrences to a reference grid such as the European reference grid of the European Environment Agency (EEA) at 1 km scale. In this spatial aggregation we randomly choose a point within the occurrence circle and assign it to the grid cell in which it is contained. We can aggregate further by time (e.g. by year) and taxonomy (e.g. by species), where aggregating means counting how many occurrences are in each specific taxonomic-spatial-temporal unit.
The analogy with geometry goes further: the occurrence cube can, as any cube, be projected on an orthogonal plane by aggregating along one of the three dimensions. In particular, projecting the cube on the taxonomic and temporal dimensions can be done by adding up the number of occurrences, or counting the number of occupied cells, thus estimating the area of occupancy.
The occurrence cube paradigm has been developed within the Tracking Invasive Alien Species (TrIAS) project (
methodology, data analysis, GBIF, species distribution modelling, indicators, TrIAS
Damiano Oldoni
TDWG 2020
BelSPO BR/165/A1/TrIAS