Biodiversity Information Science and Standards : Conference Abstract
Print
Conference Abstract
Avenues into Integration: Communicating taxonomic intelligence from sender to recipient
expand article infoBeckett W. Sterner, Nathan Upham, Atriya Sen, Nico M Franz
‡ Arizona State University, Tempe, United States of America
Open Access

Abstract

“What is crucial for your ability to communicate with me… pivots on the recipient’s capacity to interpret—to make good inferential sense of the meanings that the declarer is able to send” (Rescher 2000, p148).

Conventional approaches to reconciling taxonomic information in biodiversity databases have been based on string matching for unique taxonomic name combinations (Kindt 2020, Norman et al. 2020). However, in their original context, these names pertain to specific usages or taxonomic concepts, which can subsequently vary for the same name as applied by different authors. Name-based synonym matching is a helpful first step (Guala 2016, Correia et al. 2018), but may still leave considerable ambiguity regarding proper usage (Fig. 1). Therefore, developing "taxonomic intelligence" is the bioinformatic challenge to adequately represent, and subsequently propagate, this complex name/usage interaction across trusted biodiversity data networks. How do we ensure that senders and recipients of biodiversity data not only can share messages but do so with “good inferential sense” of their respective meanings?

Figure 1.

The problem of taxonomic name/usage (TNU) ambiguity in biodiversity data. Two alternative usages (“1” and “2”) of a species name (“A”) are shown in their geospatial context as circumscribed by a set of georeferenced museum voucher specimens. Those usages were published as taxonomic opinions by given authors, both circumscribing name A for a species-level entity in different ways. Both usages share the same type specimen and locality (shown by a star) and thus the same name A, authority, and year. Name-based string matching is insufficient for parsing this type of TNU change unambiguously. Taxonomically intelligent methods, yet to be developed in a scalable fashion, are instead required.

Key obstacles have involved dealing with the complexity of taxonomic name/usage modifications through time, both in terms of accounting for and digitally representing the long histories of taxonomic change in most lineages. An important critique of proposals to use name-to-usage relationships for data aggregation has been the difficulty of scaling them up to reach comprehensive coverage, in contrast to name-based global taxonomic hierarchies (Bisby 2011). The Linnaean system of nomenclature has some unfortunate design limitations in this regard, in that taxonomic names are not unique identifiers, their meanings may change over time, and the names as a string of characters do not encode their proper usage, i.e., the name “Genus species” does not specify a source defining how to use the name correctly (Remsen 2016, Sterner and Franz 2017). In practice, many people provide taxonomic names in their datasets or publications but not a source specifying a usage. The information needed to map the relationships between names and usages in taxonomic monographs or revisions is typically not presented it in a machine-readable format.

New approaches are making progress on these obstacles. Theoretical advances in the representation of taxonomic intelligence have made it increasingly possible to implement efficient querying and reasoning methods on name-usage relationships (Chen et al. 2014, Chawuthai et al. 2016, Franz et al. 2015). Perhaps most importantly, growing efforts to produce name-usage mappings on a medium scale by data providers and taxonomic authorities suggest an all-or-nothing approach is not required. Multiple high-profile biodiversity databases have implemented internal tools for explicitly tracking conflicting or dynamic taxonomic classifications, including eBird using concept relationships from AviBase (Lepage et al. 2014); NatureServe in its Biotics database; iNaturalist using its taxon framework (Loarie 2020); and the UNITE database for fungi (Nilsson et al. 2019). Other ongoing projects incorporating taxonomic intelligence include the Flora of Alaska (Flora of Alaska 2020), the Mammal Diversity Database (Mammal Diversity Database 2020) and PollardBase for butterfly population monitoring (Campbell et al. 2020).

Keywords

taxonomic name, taxonomic concept, logic reasoning, artificial intelligence, data integration, Darwin Core, information retrieval, extended specimen

Presenting author

Beckett W. Sterner

Presented at

TDWG 2020

Funding program

National Science Foundation Science and Technology Studies Program

Grant title

Productive Ambiguity in Classification ID#1827993

Conflicts of interest

None declared.

References