Biodiversity Information Science and Standards : Conference Abstract
|
Corresponding author: Matthew Collins (mcollins@acis.ufl.edu)
Received: 11 Apr 2018 | Published: 21 May 2018
© 2018 Matthew Collins, Rebecca Tarvin, Martha Kandziora, Wasila Dahdul, Deborah Paul
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation: Collins M, Tarvin R, Kandziora M, Dahdul W, Paul D (2018) Phenomap - Challenges and Successes in Bringing Together Multiple Data Projects to Build New Visualizations of Phenotypic Information and Specimen Records. Biodiversity Information Science and Standards 2: e25698. https://doi.org/10.3897/biss.2.25698
|
|
Connecting biodiversity data across databases is not as easy as one might think. Different databases use different identifiers and taxonomies and connecting these data often results in loss of information and precision. Here we present some of the challenges we faced with integrating multiple biodiversity data sets, including specimen data from the scientific collections, during a hackathon hosted by the Phenoscape project in December of 2017. The hackathon brought together a diverse group of participants, including biologists and software developers, to explore ways of using the computable phenotype data in the Phenoscape Knowledgebase (KB) (
Phenoscape uses terms from anatomy, quality, and taxonomy ontologies to annotate characters and taxonomic information from the phylogenetic literature along with specimen information. When populating the KB, specimen identifiers such as occurrence identifiers, collector’s number, and catalog numbers were preserved if present in the literature. We found that these identifiers, although standard in the biodiversity domain, were mostly insufficient to uniquely identify the source specimen in iDigBio. As an alternative, we instead mapped all the occurrences of taxa using string matches of the genus and species from Vertebrate Taxonomy Ontology identifiers. Without specimen identifiers that are consistent across databases, we lost the ability to explore spatial and temporal variation of characters within genera and were only able to explore phenotypes and geographic distributions among genera. We look forward to discussing these issues with the collections community represented at this meeting by the Society for the Preservation of Natural History Collections (SPNHC).
We developed an R Shiny application that integrates characters and taxa from Phenoscape with specimen records from iDigBio and phylogenies from OT, to visualize phenotypic characters and taxon distributions in three interactive panels. The app allows a user to visualize OT phylogenies and place presence/absence character data on the tree. Specifically, users can: select taxa or specific characters to visualize their geographic distributions, navigate a phylogeny browser which displays character and specimen data available for taxa under consideration, and view a heatmap of characters available for character and taxon combinations. Because of our challenges joining data, our distribution map leaves users with the impression that all individuals in a genus exhibit a character whereas the KB was populated with data describing individuals. We hope that with improved data standards and their use by more people, constructing applications like ours will become easier.
phenoscape, idigbio, phylogeny, trait, linked data
Matthew Collins
Biodiversity Information Standards (TDWG) 2018, Dunedin, NZ