Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: Jocelyn Pender (pender.jocelyn@gmail.com)
Received: 30 Sep 2020 | Published: 09 Oct 2020
This is an open access article distributed under the terms of the CC0 Public Domain Dedication.
Citation:
Pender J, Sachs J, Lujan-Toro B, Macklin J (2020) Even Simple Habitat Ontologies are Hard to Use. Biodiversity Information Science and Standards 4: e59190. https://doi.org/10.3897/biss.4.59190
|
An essential component in describing, delimiting, and understanding the evolutionary context of a taxon is characterizing the habitats in which the taxon is found. We report on a simple habitat ontology that we have developed, and on our ongoing experience using volunteers to annotate legacy habitat descriptions with terms from the ontology.
Our botanical informatics group is building the Canadian Flora Commons, a knowledge platform to aggregate, integrate and facilitate collaboration on information about Canadian plants. Species pages in the Commons are seeded with structured data extracted from authoritative sources such as the Flora of North America (FNA), Flora of British Columbia, etc. In previous TDWG talks (e.g.,
Environment ontologies that could aid in the standardization of habitat descriptors exist, notably ENVO (ENVironment Ontology;
To address this, we developed a small and simple habitat ontology by examining over 3000 habitat descriptions across multiple families, and asked “what is the author trying to tell us?”. In our taxonomic treatment authoring tool, being developed as part of another project, we will use this ontology to replace or supplement the single “habitat” field with multiple habitat dimensions (“soil type”, “canopy coverage”, etc.), some with controlled vocabularies (e.g. {open, closed, partial} for canopy coverage). We are also “translating” legacy habitat descriptions into instance data for the ontology. This is a time-consuming process and has the potential to be dependent on interpretations made by the translator. The crowdsourcing experiment described below is aimed at addressing the first issue and quantifying the second.
With our centre's support, we recruited a team of volunteers (6–8 at any given time), and taught them how to annotate habitat descriptions with WebProtegé (
While a purpose-built habitat ontology offers advantages over existing environment ontologies and a consensus was reached on habitat class definitions (e.g., moisture, elevation, canopy coverage), we discovered that it is difficult to achieve consensus on the application of habitat classes. Between the two groups, shared annotations represented 57% of the total annotations added to terms and phrases and unique annotations represented 43%. This aligns with previous efforts to build a controlled vocabulary for FNA treatments, where differences between term categorizations represented 49% of the effort (
The number of unique and shared annotations made by our volunteer habitat ontology group by class. Unique annotations are classes added to a habitat description by only one group (e.g., only one group added “canopy coverage = closed” to the phrase “pine forest”).
Our talk will describe our findings, discuss the subjectivity of habitat classes and other difficulties we’ve encountered while building our ontology, and demonstrate the power of a habitat-driven search interface. This interface will live alongside parsed morphological descriptions (see dev.floranorthamerica.org). We invite collaboration towards increasing the robustness and applicability of the ontology.
ontology, crowdsourcing, environment, botany, floras
Jocelyn Pender
TDWG 2020