Biodiversity Information Science and Standards : Conference Abstract
|
Corresponding author: Paula F Zermoglio (pzermoglio@gmail.com)
Received: 02 Apr 2018 | Published: 18 May 2018
© 2018 Paula Zermoglio
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation: Zermoglio P (2018) Vocabularies of Values: Tackling the Heterogeneity Problem. Biodiversity Information Science and Standards 2: e25438. https://doi.org/10.3897/biss.2.25438
|
In the process of sharing information, it is of highest importance that we utilize common codes and signifiers, so that communication is effective. This process presents a series of complexities that are related to capturing and transmitting the meaning of the information despite homonymy, polysemy and synonymy. Biodiversity data sharing is not exempt from these challenges and understanding the meaning often requires expert knowledge. For communication to be effective, and therefore for data to be of maximal re-use, we need common vocabularies that unequivocally refer us to the same concepts.
The community has agreed upon some vocabularies to structure shared information, i.e., biodiversity data standards such as the Darwin Core standard (
While many vocabularies exist in the community, we currently do not possess a full suite of vocabularies of values that apply uniformly across the biodiversity data community and there is no single repository to explore the available resources. While some of the available vocabularies are discipline-specific, many that could be applied more broadly remain independent and scattered. Additionally, similar lists of terms that refer to the same concepts can be found in different languages, but disconnected from one another.
The lack of or non-adherence to vocabularies of values constitutes a data quality issue, as the heterogeneity in the data renders data less discoverable and difficult to use. Capturing information in myriad ways risks being incomplete and inaccurate in our transmission of information. If we cannot be certain that a particular value unambiguously refers to a particular concept, we cannot assert that a record containing that value could reliably be used for a particular purpose. In this context, the construction and use of vocabularies of values, including the explicit declaration of usage, is a data quality issue.
From the TDWG Data Quality Interest Group we have begun to tackle this problem, with the aim of creating a suitable environment for thought and development of vocabularies of values. Accordingly, a new task group has been constituted, whose main goals are to:
This will provide the community with a framework to work on and build upon vocabularies of values in a way that would allow better understanding and maximal interoperability.
vocabularies of values, data quality, heterogeneity
Paula F Zermoglio