Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: Roderic Page (roderic.page@glasgow.ac.uk)
Received: 03 Oct 2024 | Published: 04 Oct 2024
© 2024 Roderic Page
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Page R (2024) Who is Doing Taxonomy, Whereabouts, and Who Is Funding Them? A Practical Test of What Knowledge Graphs Can Tell Us about Taxonomic Research. Biodiversity Information Science and Standards 8: e138477. https://doi.org/10.3897/biss.8.138477
|
|
What is the current state of taxonomy? Quentin Wheeler on his podcast "Species Hall of Fame" fears for taxonomy's future, whereas Lucas Joppa and colleagues have famously argued that we've never had so many taxonomists as we do now (
The immediate motivation for this talk comes from a tool I recently developed to track the recent taxonomic literature. Inspired by work by the late David Remsen on uBioRSS (
To navigate this data, I created a simple web site that provides a treemap view of the GBIF classification, a map, and a list of works ordered from most recent to oldest (Fig.
Screenshot of BioRSS showing recent papers on Arachnida in China. Other combinations of taxa and geography can be explored using the treemap and geographic maps on the left.
ORCID helpfully provides their data in RDF in JavaScript Object Notation for Linked Data (JSON-LD) format, which we can use to create a simple knowledge graph connecting people, places, publications, and organisations (Fig.
Simplified version of the data model used by ORCID to export data in RDF. The labels for nodes and edges in the graph come from schema.org.
Simplified data model for a bibliographic record showing links between a work, its author(s) and funder(s). The labels for nodes and edges in the graph come from schema.org.
The final part of the knowledge graph is the connection between taxonomic names and works. One approach would be to use the RSS feeds harvested by BioRSS, which was the original motivation for this work. However, not all the articles BioRSS aggregates are taxonomic, so we would need to be able to reliably filter out non-taxonomic works. In the absence of such a filter I have used lists of recent taxonomic names and publications from
The talk will discuss the construction of this knowledge graph, lessons learnt along the way, and what it tells us about taxonomists and their funders. The talk will also discuss strategies for the inevitable gap-filling required to flesh out the knowledge graph. Preliminary results reveal that information on author affiliations and funding is often not recorded in either ORCID or CrossRef, which means we will either have to use proprietary databases (such as Dimensions), or scrape it from the Web. The latter approach is likely to benefit from recent developments in machine learning, for example using Large Language Models (LLMs) to parse the acknowledgements section of a paper to extract details on funders and grants. Prospects for these methods will be discussed.
linked data, taxonomy, knowledge graph, funding
Roderic Page
SPNHC-TDWG 2024