Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: Hong Cui (hongcui@email.arizona.edu)
Received: 25 Sep 2021 | Published: 27 Sep 2021
This is an open access article distributed under the terms of the CC0 Public Domain Dedication.
Citation:
Cui H, Ford B, Starr J, Macklin J, Reznicek A, Giebink NW, Longert D, Léveillé-Bourret É, Zhang L (2021) Author-Driven Computable Data and Ontology Production for Taxonomists. Biodiversity Information Science and Standards 5: e75741. https://doi.org/10.3897/biss.5.75741
|
It takes great effort to manually or semi-automatically convert free-text phenotype narratives (e.g., morphological descriptions in taxonomic works) to a computable format before they can be used in large-scale analyses. We argue that neither a manual curation approach nor an information extraction approach based on machine learning is a sustainable solution to produce computable phenotypic data that are FAIR (Findable, Accessible, Interoperable, Reusable) (
The key causes of this variation have been identified as semantic vagueness in original phenotype descriptions and difficulties in using standardized vocabularies (ontologies). We argue that the authors describing characters are the key to the solution. Given the right tools and appropriate attribution, the authors should be in charge of developing a project's semantics and ontology. This will speed up ontology development and improve the semantic clarity of the descriptions from the moment of publication. In this presentation, we will introduce the Platform for Author-Driven Computable Data and Ontology Production for Taxonomists, which consists of three components:
Fig.
The presentation will consist of:
The software modules currently incorporated in Character Recorder and Conflict Resolver have undergone formal usability studies. We are actively recruiting Carex experts to participate in a 3-day usability study of the entire system of the Platform for Author-Driven Computable Data and Ontology Production for Taxonomists. Participants will use the platform to record 100 characters about one Carex species. In addition to usability data, we will collect the terms that participants submit to the underlying ontology and the data related to conflict resolution. Such data allow us to examine the types and the quantities of logical conflicts that may result from the terms added by the users and to use Discrete Event Simulation models to understand if and how term additions and conflict resolutions converge.
We look forward to a discussion on how the tools (Character Recorder is online at http://shark.sbs.arizona.edu/chrecorder/public) described in our presentation can contribute to producing and publishing FAIR data in taxonomic studies.
community ontology building, FAIR, taxonomic descriptions, survey, software, ontology-aware data editor, conflicts in ontology, mobile application, Character Recorder, Conflict Resolver
Hong Cui
TDWG 2021
Advances in Biological Informatics of U.S. National Science Foundation (NSF)