Biodiversity Information Science and Standards : Conference Abstract
|
Corresponding author: Lyubomir Penev (penev@pensoft.net)
Received: 30 Mar 2019 | Published: 13 Jun 2019
© 2019 Lyubomir Penev, Teodor Georgiev, Mariya Dimitrova, Yasen Mutafchiev, Pavel Stoev, Robert Mesibov
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation: Penev L, Georgiev T, Dimitrova M, Mutafchiev Y, Stoev P, Mesibov R (2019) Data Auditing, Cleaning and Quality Assurance Workflows from the Experience of a Scholarly Publisher. Biodiversity Information Science and Standards 3: e35019. https://doi.org/10.3897/biss.3.35019
|
|
Data publishing became an important task in the agenda of many scholarly publishers in the last decade, but far less attention has been paid to the actual reviewing and quality checking of the published data. Quality checks are often being delegated to the reviewers of the article narrative, many of whom may not be qualified to provide a professional data review. The talk presents the workflows developed and used by Pensoft journals to provide data auditing, cleaning and quality assurance. These are:
Data auditing and cleaning workflow implemented in Pensoft for all datasets published as data papers.
We have realised in the course of many years experience in data publishing that data quality checking and assurance testing requires specific knowledge and competencies, which also vary between the various methods of data handling and management, such as relational databases, semantic XML tagging, Linked Open Data, and others. This process cannot be trusted to peer reviewers only and requires the participation of dedicated data scientists and information specialists in the routine publishing process. This is the only way to make the published biodiversity data, such as taxon descriptions, occurrence records, biological observations and specimen characteristics, truly FAIR (Findable, Accessible, Interoperable, Reusable), so that they can be merged, reformatted and incorporated into novel and visionary projects, regardless of whether they are accessed by a human researcher or a data-mining process.
Teodor Georgiev