Proceedings of TDWG : Conference Abstract
Print
Conference Abstract
Quantifying quality: the "Apparent Quality Index", a measure of data quality for occurrence datasets
expand article info Francisco Pando
‡ Real Jardin Botanico -CSIC, Madrid, Spain
Open Access

Abstract

When making an initial assessment of a dataset originating from an unfamiliar source, a user typically relies on the visible properties of the dataset as a whole, such as, the title, the publisher, and the size of the dataset. Aspects of data quality are usually out of view, beyond some intuitions and hard to compare assertions. In 2007 at GBIF Spain we tried to correct that by developing an index that enables a user to assess the quality of Darwin Core datasets published by GBIF-Spain, and  to track improvements in quality over time. Our goal was to create an index that is explicit, easy to understand, and easy to obtain. We dubbed that index "ICA" GBIF Spain (2010) for its name in Spanish "Índice de Calidad Aparente" (Apparent Quality Index). We say ICA measures "apparent quality", because, although unlikely, a dataset can have a high ICA, while its records are actually a poor reflection of the reality to which they refer. ICA summarizes data quality on the three primary dimensions of biodiversity data: taxonomic, geospatial and temporal.

In this contribution we will present the rationale behind the ICA, how it is calculated, how it works within the Darwin Test tool Ortega-Maqueda and Pando (2008), how it is integrated in the data publication processes of GBIF Spain, and some discussion and results about its utility and potential. We also compare ICA to the emerging framework for data quality assessmentTDWG Data Quality Interest Group (2016).

Keywords

Data quality, biodiversity informatics, fitness for use, occurrence datasets

Presenting author

Francisco Pando

References

login to comment