Corresponding author: Lee Belbin (
Academic editor:
Other than data availability, ‘Data Quality’ is probably the most significant issue for users of biodiversity data and this is especially so for the research community.
A standard core (fundamental) set of tests and associated assertions based around A standard suite of descriptive fields for each test Broad deployment of the tests, from collector to aggregator A set of basic principles for the creation of tests/assertions Software that provides an example implementation of each test Data that can be used to validate an implementation of the tests A publication that captures the knowledge built during the creation of the tests/assertions
The tests and rules generating assertions at the record-level are more fundamental than the tools or workflows that will be based on them. The priority is to create a fully documented suite of core tests that define a framework for ready extension across terms and domains.
The core tests have proven to be far more complex than any of the team had anticipated. Several times over the past three years, we believed we had finalized the tests, only to find new issues that have required a fresh understanding and subsequent edits, e.g., the most recent dropping of the two tests related to
This decision resulted from a review of dwc:identificationQualifier values in GBIF records and an evaluation of expected values based on the Darwin Core definition of the term. Aside from there being many values, the term expects the qualifier in relation to a given taxonomic name, and rules of open nomenclature are unevenly adopted across data records to reliably parse and detect dwc:identificationQualifier for these tests to be effective.
A similar situation occurs for
Months of work on discussions and edits to the We had hoped to have a face-to-face meeting in Bariloche, Argentina early in 2020 but the Corona virus stopped that. This was unfortunate as we needed this meeting to discuss the remaining complex issues as noted above. Attempting to address such issues by Zoom has been far less efficient. We are occasionally re-visiting decisions made years earlier. An indication that we have been doing this work for (too) many years. We have now standardized all the test parameters for the 99 Two of the test fields that have taken most of our time to resolve have been ‘Parameters’ and what we now call ‘bdq:sourceAuthority’ ( We have published the work from the Data Quality Interest and Task Groups: We have extended the Development of the datasets that validate the implementation of the tests continues. We recognize the dependence on the work of the
We will provide details of the challenges, the breakdown of the tests and the advances of the project.
Lee Belbin
TDWG 2020