Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: Lee Belbin (leebelbin@gmail.com)
Received: 31 Jul 2022 | Published: 01 Aug 2022
© 2022 Lee Belbin, Arthur Chapman, Paul J. Morris, John Wieczorek
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Belbin L, Chapman A, Morris PJ, Wieczorek JR (2022) It Takes Years for a Good Wine to Mature: Task Group 2 - data quality tests and assertions. Biodiversity Information Science and Standards 6: e91078. https://doi.org/10.3897/biss.6.91078
|
Data Quality Task Group 2 was established to create a suite of core tests and associated assertions about the 'quality' of biodiversity informatics data (
Why has it gone so slowly? It is mostly due to the complexity of the task and the inability to meet face-to-face. Zoom just doesn’t cut it for this type of work. We achieved the most at our one face-to-face meeting in Gainesville (Florida) in 2018. Our advances over the past year have come from rounds of feedback between the test specifications, test implementation, development of data for validating the tests and comparison between results from implementations and the expectations of the validation data. There are hopefully useful lessons in this for similar projects.
We now have a solid base where future evolution, such as tests for specific environments, will be made relatively easy. The major components of this project are the 99 tests themselves, the parameters for these tests (see https://github.com/tdwg/bdq/issues/122), a vocabulary of the terms used in the framework and test data for validating implementations of the tests.
We remain focused on what we call core tests: those that provide power in evaluating ‘fitness for use’, are widely applicable and are relatively easy to implement. The test descriptions we have settled on are:
The composition of the core tests has been stable for over a year. We have generated most of the test data using the template: the applicable test, a unique identifier, input data, expected output data, the response status (e.g., “internal prerequisites not met”), the response result (e.g., “not compliant”), and an optional comment.
What remains to be done? We need to complete the test data, produce normative and non-normative documentation, and transform our work into a TDWG Technical Specification. While TG2 is over 95% complete, we would still welcome anyone who is interested to learn about biodiversity data quality to contribute.
specifications, vocabulary, biodiversity data, validation, amendment, report
Lee Belbin
TDWG 2022
We acknowledge the significant contributions of Paula Zermoglio and Alex Thompson as original TG2 team members. We also value the comments of Deborah Paul and Allan Koch Veiga on our GitHub issues.