Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: John Thomas Waller (jhnwllr@gmail.com)
Received: 05 Aug 2022 | Published: 23 Aug 2022
© 2022 John Waller
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Waller JT (2022) Finding Data Gaps in the GBIF Backbone Taxonomy. Biodiversity Information Science and Standards 6: e91312. https://doi.org/10.3897/biss.6.91312
|
When publishers supply GBIF (Global Biodiversity Information Facility) with a dwc:scientificName, this name is sometimes not found in the GBIF taxonomic backbone. The backbone is needed to organize occurrences on GBIF. In these cases, the occurrence records get a data quality flag called taxon match higher rank. This means that GBIF was only able to match the name to a higher rank. Matching is a process whereby a name supplied by the publisher is compared to a name in the already existing in the GBIF backbone taxonomy.
At GBIF, we would always like to match the name supplied by the publisher to the lowest rank possible, so that when a user comes to GBIF looking for a certain name, they will have access to the largest amount of occurrence data possible.
The main goals of this project were:
In Fig.
Unique names from occurrences supplied to GBIF from publishers that have received the taxon match higher rank flag.
name not matched | reason |
Mystery mystery | bad name |
Sonus naturalis | bad name |
Bambusoideae spec. | subfamily name |
Coleoptera indet. | order name |
Astarte juv. | genus name with life stage |
Gen. sp. | bad name |
Astarte sp. BIOUG14667-B01 | family with id |
Phoneutria depilata (Strand 1909) sp. reval. | species name with remark |
Anisoptera Unknown Dragonfly Species | infra-order name with remarks |
Zygoptera | suborder name |
Philodromus Philodromus albidus / rufus | doubtful identification (alternative) |
Certhia brachydactyla/Certhia familiaris | doubtful identification (alternative) |
Corvus corone x C. cornix | hybrid |
BOLD:ADV7315 | OTU (Operational Taxonomic Unit) |
BOLD:ADX5419 | OTU |
Publishers to GBIF sometimes do not provide enough information in the dwc:scientificName for GBIF to choose between names in the backbone Fig.
Publishers also supply GBIF with a variety of what I call unmatchable names, which are names that are impossible to match to the GBIF backbone. Sometimes these names are acceptable names, but still missing from the backbone, like missing hybrids or OTUs (Operational Taxonomic Units). Other names are simply bad names that we can’t expect to fix. Some examples below:
Table
It is often hard to tell if a missing name is a real data gap. To check, I randomly sampled five possibly missing names from each group from Fig.
Around 50% (44 of 86) of the possibly missing names appear to be genuinely missing from the GBIF backbone. We can therefore conservatively assume that there are thousands of missing names in the GBIF backbone. Keep in mind, however, that many missing names are missing synonyms—that is, they are not unique taxon concepts. Taking half of 50% (25%), we can make a conservative minimum missing names Table
Conservative minimum missing names. Based on conservative judgment, 25% of potentially missing names are genuinely absent from the GBIF backbone. Download a full table of possibly missing names from the groups above here.
group | friendly name | min estimated missing names |
Coleoptera | Beetles | 26,600 |
Lepidoptera | Butterflies | 17,700 |
Passeriformes | Bird order | 4,200 |
Fabales | Plant order | 4,100 |
Asterales | Plant order | 4,000 |
Agaricales | Mushrooms | 1,600 |
Araneae | Spiders | 1,200 |
Rodentia | Rodents | 1,100 |
Carditida | Bivalves | 700 |
Anura | Frogs | 600 |
Carnivora | Carnivores | 300 |
Odonata | Dragonflies | 300 |
Chiroptera | Bats | 200 |
Cyatheales | Ferns | 100 |
Primates | Primates | 100 |
Neuroptera | Insect order | <100 |
Percopsiformes | Fish order | <100 |
As a data publisher, there are a few things that can be done to improve name matching to the GBIF backbone.
taxonomic backbone, scientific name, data quality
John Thomas Waller
TDWG 2022