Practical use of aggregator data quality metrics in a collection scenario

Andrew Bentley

doi:10.3897/biss.2.25970

Biodiversity Information Science and Standards : Conference Abstract

Conference Abstract

Practical use of aggregator data quality metrics in a collection scenario

Andrew Bentley ^‡

‡ University of Kansas, Lawrence, KS, United States of America

Corresponding author: Andrew Bentley (abentley@ku.edu)

Received: 18 Apr 2018 | Published: 13 Jun 2018

This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Citation: Bentley A (2018) Practical use of aggregator data quality metrics in a collection scenario. Biodiversity Information Science and Standards 2: e25970. https://doi.org/10.3897/biss.2.25970

Abstract

The recent incorporation of standardized data quality metrics into the GBIF, iDigBio, and ALA portal infrastructures enables data providers with useful information they can use to clean or augment Darwin Core data at the source based on these recommendations. Numerous taxonomic and geographic based metrics provide useful information on the quality of various Darwin Core fields in this realm, while also providing input on Darwin Core compliance for others. As a provider/data manager for the Biodiversity Institute, University of Kansas, and having spent some time evaluating their efficacy and reliability, this presentation will highlight some of the positive and negative aspects of my experience with specific examples while highlighting concerns regarding the user experience and standardization of these metrics across the aggregator landscape. These metrics have indicated both data and publishing issues that have increased the utility and cleanliness of our data while also highlighting batch processing challenges and issues with the process of inferring "bad" data. The integration of these metrics into source database infrastructure will also be postulated, with Specify Software as an example.

Keywords

Aggregators, GBIF, iDigBio, metrics, data quality, collections, IPT

Presenting author

Bentley, Andrew C

Presented at

SPNHC/TDWG 2018

Acknowledgements

University of Kansas Biodiversity Institute (KUBI)

Specify Software Project (http://www.sustain.specifysoftware.org/)

Funding program

DBI - ADVANCES IN BIO INFORMATICS

Grant title

ABI Sustaining: Supporting Biological Collections Computing withSpecify

Hosting institution

Ethics and security

Author contributions

Conflicts of interest

References

Supplementary material

Endnotes