Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: Matthew Yoder (diapriid@gmail.com)
Received: 17 Sep 2021 | Published: 20 Sep 2021
© 2021 Matthew Yoder, Dmitry Dmitriev
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Yoder M, Dmitriev DA (2021) Nomenclature over 5 years in TaxonWorks: Approach, implementation, limitations and outcomes. Biodiversity Information Science and Standards 5: e75441. https://doi.org/10.3897/biss.5.75441
|
We are now over four decades into digitally managing the names of Earth's species. As the number of federating (i.e., software that brings together previously disparate projects under a common infrastructure, for example TaxonWorks) and aggregating (e.g., International Plant Name Index, Catalog of Life (CoL)) efforts increase, there remains an unmet need for both the migration forward of old data, and for the production of new, precise and comprehensive nomenclatural catalogs. Given this context, we provide an overview of how TaxonWorks seeks to contribute to this effort, and where it might evolve in the future.
In TaxonWorks, when we talk about governed names and relationships, we mean it in the sense of existing international codes of nomenclature (e.g., the International Code of Zoological Nomenclature (ICZN)). More technically, nomenclature is defined as a set of objective assertions that describe the relationships between the names given to biological taxa and the rules that determine how those names are governed. It is critical to note that this is not the same thing as the relationship between a name and a biological entity, but rather nomenclature in TaxonWorks represents the details of the (governed) relationships between names. Rather than thinking of nomenclature as changing (a verb commonly used to express frustration with biological nomenclature), it is useful to think of nomenclature as a set of data points, which grows over time. For example, when synonymy happens, we do not erase the past, but rather record a new context for the name(s) in question. The biological concept changes, but the nomenclature (names) simply keeps adding up.
Behind the scenes, nomenclature in TaxonWorks is represented by a set of nodes and edges, i.e., a mathematical graph, or network (e.g., Fig.
Visualizing nomenclature in TaxonWorks as a network (graph). Over 700K data points corresponding to over 120K names, their relationships, status, and citations in the 3i World Auchenorrhyncha Database (
Screenshot of the "Browse nomenclature" interface in TaxonWorks, showing the nomenclatural history of a name used for cicadas.
As in any complex knowledge-representation model, there are outlying scenarios, or edge cases that emerge, making certain human tasks more complex than others. TaxonWorks is no exception, it has limitations in terms of what and how some things can be represented. While many complex representations are hidden by simplified user-interfaces, some, for example, the handling of the ICZN's Family-group name, batch-loading of invalid relationships, and comparative syncing against external resources need more work to simplify the processes presently required to meet catalogers' needs.
The depth at which TaxonWorks can capture nomenclature is only really valuable if it can be used by others. This is facilitated by the application programming interface (API) serving its data (https://api.taxonworks.org), serving text files, and by exports to standards like the emerging Catalog of Life Data Package. With reference to real-world problems, we illustrate different ways in which the API can be used, for example, as integrated into spreadsheets, through the use of command line scripts, and serve in the generation of public-facing websites.
Behind all this effort are an increasing number of people recording help videos, developing documentation, and troubleshooting software and technical issues. Major contributions have come from developers at many skill levels, from high school to senior software engineers, illustrating that TaxonWorks leads in enabling both technical and domain-based contributions. The health and growth of this community is a key factor in TaxonWork's potential long-term impact in the effort to unify the names of Earth's species.
Matthew Yoder
TDWG 2021
Many people have contributed to refining TaxonWorks and the cloud of resources surrounding it, see https://github.com/SpeciesFileGroup/taxonworks/blob/development/CONTRIBUTORS.md for more.