Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: Dmitry Mozzherin (dmozzherin@gmail.com)
Received: 16 Aug 2024 | Published: 19 Aug 2024
© 2024 Dmitry Mozzherin, Deborah Paul, Amanda Whitmire
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Mozzherin D, Paul D, Whitmire A (2024) Can We Standardize Name Reconciliaton via OpenRefine? Biodiversity Information Science and Standards 8: e134910. https://doi.org/10.3897/biss.8.134910
|
|
Scientific names in biodiversity represent one of the oldest identifiers used in science. As a result, a common repetitive task is being able to reconcile a list of scientific names against curated data sources. Reconciliation allows one to determine if names in a list are spelled correctly, whether they are currently accepted, and their nomenclatural status. There are several online and local resources that provide reconciliation services. We share here the potential in interoperability across reconciliation tools.
Global Names Verifier (GNverifier), Catalogue of Life, Global Biodiversity Information Facility (GBIF), Taxonomic Name Resolution Service (TNRS), LifeWatch, National Center for Biotechnology Information (NCBI), World Flora Online, Global Biotic Interactions (GloBI), Nomer, Wikidata, and others provide their own tools for name reconciliation. All these tools have their scope, design decisions, input, and output formats. It is often useful to do reconciliation using several such services, because they often include complementary data. However, with all the idiosyncrasies of services and lack of standardization, it is not an easy task (
However, standardizing all existing and future resources to a common interface would be difficult. Some of them have no monetary or programmatic means to modify their code, while others have more urgent priorities. Some resources support a specific research path where adhering to a rigid standard might hinder their innovation. In this paper we suggest interoperability between reconciliation tools by implementing the OpenRefine Reconciliation Service. OpenRefine is a popular and powerful reconciliation and data cleaning application. It is used by many researchers for data transformation and normalization. Any service that implements the OpenRefine Service can be incorporated into data-management workflows just by providing the service's OpenRefine-compatible URL. Such compatible services can easily be discovered by providing their metadata in the OpenRefine Services Registry.
In this paper we discuss our implementation of the OpenRefine Service with the Global Names Verifier (GNverifier) reconciliation tool.
GNverifier is developed at the Species File Group as a part of the Global Names Architecture initiative. It offers a powerful, configurable, fast way to reconcile scientific names. GNverifier software aggregates data from more than 100 source datasets. Queries return currently accepted names when provided in a dataset. It allows finding matches for names that historically had several suffixes and can do fuzzy and partial matches. It sorts data by many factors to reliably provide the best available results. With a strong focus on software optimization and a sophisticated matching algorithm, it can process 2000 names a second, making it one of the fastest services available.
OpenRefine can use GNverifier directly because it is compatible with the OpenRefine protocol. As shown in Fig.
OpenRefine makes it easy to choose between several reconciliation services, in this case Wikidata and Global Names Verifier.
Implementation of the OpenRefine protocol might solve many standardization problems. Some resources already have it implemented (e.g., Wikidata, GNverifier
A basic reconciliation example where rows 11 - 12 require a human to make a choice, while rows 13 - 15 reconciled automatically.
Beyond the basic reconciliation (as seen in Fig.
We think OpenRefine would be a significant step forward for standardization between name-reconciliation tools.
name-reconciliation services, interoperability, FAIR, Global Names Verifier
Dmitry Mozzherin
SPNHC-TDWG 2024
Special thanks to David Shorthouse and Nicky Nicolson for their encouragement, advice, and help in implementation of this OpenRefine name-reconciliation tool.