63urn:lsid:arphahub.com:pub:0E0032F4-55AE-5263-8B3C-F4DD637C30C2Biodiversity Information Science and StandardsBISS2535-0897Pensoft Publishers10.3897/biss.3.362073620711030Conference AbstractSS77 - Digital biodiversity data as a frontier for new research avenuesLinking Biodiversity Data Using Evolutionary HistoryMcTavishEmily Janeejmctavish@ucmerced.edu1University of California, Merced, United States of AmericaUniversity of CaliforniaMercedUnited States of America
Corresponding author: Emily Jane McTavish (ejmctavish@ucmerced.edu).
Academic editor:
2019210620193e362070F474F69-AFD2-5548-8421-8031BF16C4D0326189614052019Emily Jane McTavishThis is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
All life on earth is linked by a shared evolutionary history. Even before Darwin developed the theory of evolution, Linnaeus categorized types of organisms based on their shared traits. We now know these traits derived from these species’ shared ancestry. This evolutionary history provides a natural framework to harness the enormous quantities of biological data being generated today.
The Open Tree of Life project is a collaboration developing tools to curate and share evolutionary estimates (phylogenies) covering the entire tree of life (Hinchliff et al. 2015, McTavish et al. 2017). The tree is viewable at https://tree.opentreeoflife.org, and the data is all freely available online. The taxon identifiers used in the Open Tree unified taxonomy (Rees and Cranston 2017) are mapped to identifiers across biological informatics databases, including the Global Biodiversity Information Facility (GBIF), NCBI, and others. Linking these identifiers allows researchers to easily unify data from across these different resources (Fig. 1). Leveraging a unified evolutionary framework across the diversity of life provides new avenues for integrative wide scale research. Downstream tools, such as R packages developed by the R OpenSci foundation (rotl, rgbif) (Michonneau et al. 2016, Chamberlain 2017) and others tools (Revell 2012), make accessing and combining this information straightforward for students as well as researchers (e.g. https://mctavishlab.github.io/BIO144/labs/rotl-rgbif.html).
For example, a recent publication by Santorelli et al. 2018 linked evolutionary information from Open Tree with species locality data gathered from a local field study as well as GBIF species location records to test a river-barrier hypothesis in the Amazon. By combining these data, the authors were able test a widely held biogeographic hypothesis across 1952 species in 14 taxonomic groups, and found that a river that had been postulated to drive endemism, was in fact not a barrier to gene flow. However, data provenance and taxonomic name reconciliation remain key hurdles to applying data from these large digital biodiversity and evolution community resources to answering biological questions. In the Amazonian river analysis, while they leveraged use of GBIF records as a secondary check on their species records, they relied on their an intensive local field study for their major conclusions, and preferred taxon specific phylogenetic resources over Open Tree where they were available (Santorelli et al. 2018). When Li et al. 2018 assessed large scale phylogenetic approaches, including Open Tree, for measuring community diversity, they found that synthesis phylogenies were less resolved than purpose-built phylogenies, but also found that these synthetic phylogenies were sufficient for community level phylogenetic diversity analyses. Nonetheless, data quality concerns have limited adoption of analyses data from centralized resources (McTavish et al. 2017). Taxonomic name recognition and reconciliation across databases also remains a hurdle for large scale analyses, despite several ongoing efforts to improve taxonomic interoperability and unify taxonomies, such at Catalogue of Life + (Bánki et al. 2018).
In order to support innovative science, large scale digital data resources need to facilitate data linkage between resources, and address researchers' data quality and provenance concerns. I will present the model that the Open Tree of Life is using to provide evolutionary data at the scale of the entire tree of life, while maintaining traceable provenance to the publications and taxonomies these evolutionary relationships are inferred from. I will discuss the hurdles to adoption of these large scale resources by researchers, as well as the opportunities for new research avenues provided by the connections between evolutionary inferences and biodiversity digital databases.
evolutionphylogenytaxonomyinteroperabilityNational Science Foundation100000001http://doi.org/10.13039/1000000012019Biodiversity_NextBiodiversity_Next 2019Leiden, The NetherlandsA joint conference by The Global Biodiversity Information Facility (GBIF), a new pan-European Research Infrastructure initiative (DiSSCo), the national resource for digitized information about vouchered natural history collections (iDigBio), Consortium of European Taxonomic Facilities (CETAF), Biodiversity Information Standards (TDWG) and LifeWatch ERIC, the e-Science and Technology European Infrastructure for Biodiversity and Ecosystem Research.Presenting author
Emily Jane McTavish
Presented at
Biodiversity_Next 2019
Funding program
NSF Division Of Biological Infrastructure, Advances in Biological Informatics #1759846
Funding program
NSF Division Of Biological Infrastructure, Advances in Biological Informatics #1759846
Grant title
"Cultivating a sustainable Open Tree of Life"
ReferencesBánkiOlafDöringMarkusHollemanAycoAddinkWouter2018Catalogue of Life Plus: innovating the CoL systems as a foundation for a clearinghouse for names and taxonomy210.3897/biss.2.26922ChamberlainScott2017rgbif: Interface to the Global 'Biodiversity' Information Facility APIhttps://CRAN.R-project.org/package=rgbif0.99HinchliffCody E.SmithStephen A.AllmanJames F.BurleighJ. GordonChaudharyRuchiCoghillLyndon M.CrandallKeith A.DengJiabinDrewBryan T.GazisRominaGudeKarlHibbettDavid S.KatzLaura A.LaughinghouseH. DailMcTavishEmily JaneMidfordPeter E.OwenChristopher L.ReeRichard H.ReesJonathan A.SoltisDouglas E.WilliamsTiffaniCranstonKaren A.2015Synthesis of phylogeny and taxonomy into a comprehensive tree of life11241127641276910.1073/pnas.1423041112LiDaijiangTrottaLaurenMarxHannah E.AllenJulie M.SunMiaoSoltisDouglas E.SoltisPamela S.GuralnickRobert P.BaiserBenjamin H.2018For comparing phylogenetic diversity among communities, go ahead and use synthesis phylogenies10.1101/370353McTavishEmily JaneDrewBryan T.RedelingsBenCranstonKaren A.2017How and Why to Build a Unified Tree of Life391110.1002/bies.201700114MichonneauFrançoisBrownJoseph W.WinterDavid J.2016rotl: an R package to interact with the Open Tree of Life datan/a-n/an/a-n/a10.1111/2041-210X.12593ReesJonathan A.CranstonKaren2017Automated assembly of a reference taxonomy for phylogenetic data synthesis10.3897/BDJ.5.e12581RevellLiam J.2012phytools: an R package for phylogenetic comparative biology (and other things)3221722310.1111/j.2041-210X.2011.00169.xSantorelliSergioMagnussonWilliam E.DeusClaudia P.2018Most species are not limited by an Amazonian river postulated to be a border between endemism areas8110.1038/s41598-018-20596-7
Example linking phylogenetic relationships accessed from the Open Tree of Life with specimen location data from Global Biodiversity Information Facility.