63urn:lsid:arphahub.com:pub:0E0032F4-55AE-5263-8B3C-F4DD637C30C2Biodiversity Information Science and StandardsBISS2535-0897Pensoft Publishers10.3897/biss.4.591995919914649Conference AbstractSYM05 - Using collections to mitigate and prevent zoonotic disease: data mobilization and integrationLiberating Biodiversity Data From COVID-19 Lockdown: Toward a knowledge hub for mammal host-virus informationUphamNathan Snathan.upham@asu.eduhttps://orcid.org/0000-0001-5412-93421AgostiDonathttps://orcid.org/0000-0001-9286-12002PoelenJorrithttps://orcid.org/0000-0003-3138-41183PenevLyubomirhttps://orcid.org/0000-0002-2186-50334PaulDeborahhttps://orcid.org/0000-0003-2639-752055ReederDeeAnn Mariehttps://orcid.org/0000-0001-8651-20126SimmonsNancy B.7CsorbaGaborhttps://orcid.org/0000-0001-5720-46008GroomQuentinhttps://orcid.org/0000-0002-0596-53769DimitrovaMariyahttps://orcid.org/0000-0002-8083-60484MillerJoseph T10Arizona State University, Tempe, United States of AmericaArizona State UniversityTempeUnited States of AmericaPlazi, Bern, SwitzerlandPlaziBernSwitzerlandRonin Institute, Berkeley, United States of AmericaRonin InstituteBerkeleyUnited States of AmericaPensoft Publishers & Bulgarian Academy of Sciences, Sofia, BulgariaPensoft Publishers & Bulgarian Academy of SciencesSofiaBulgariaFlorida State University, Tallahassee, United States of AmericaFlorida State UniversityTallahasseeUnited States of AmericaBucknell University, Lewisburg, United States of AmericaBucknell UniversityLewisburgUnited States of AmericaAmerican Museum of Natural History, New York, United States of AmericaAmerican Museum of Natural HistoryNew YorkUnited States of AmericaHungarian Natural History Museum, Budapest, HungaryHungarian Natural History MuseumBudapestHungaryMeise Botanic Garden, Meise, BelgiumMeise Botanic GardenMeiseBelgiumGlobal Biodiversity Information Facility, Copenhagen, DenmarkGlobal Biodiversity Information FacilityCopenhagenDenmark
Corresponding author: Nathan S Upham (nathan.upham@asu.edu).
Academic editor:
2020091020204e59199D43D3563-BC10-5A33-A269-A698A0762DA401102020Nathan S Upham, Donat Agosti, Jorrit Poelen, Lyubomir Penev, Deborah Paul, DeeAnn Marie Reeder, Nancy B. Simmons, Gabor Csorba, Quentin Groom, Mariya Dimitrova, Joseph T MillerThis is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
A deep irony of COVID-19 likely originating from a bat-borne coronavirus (Boni et al. 2020) is that the global lockdown to quell the pandemic also locked up physical access to much basic knowledge regarding bat biology. Digital access to data on the ecology, geography, and taxonomy of potential viral reservoirs, from Southeast Asian horseshoe bats and pangolins to North American deer mice, was suddenly critical for understanding the disease's emergence and spread. However, much of this information lay inside rare books and personal files rather than as open, linked, and queryable resources on the internet. Even the world's experts on mammal taxonomy and zoonotic disease could not retrieve their data from shuttered laboratories. We were caught unprepared. Why, in this digitally connected age, were such fundamental data describing life on Earth not already freely accessible online?
Understanding why biodiversity science was unprepared—and how to fix it before the next pandemic—has been the focus of our COVID-19 Taskforcesince April 2020 and is continuing (organized by CETAF and DiSSCo). We are a group of museum-based and academic scientists with the goal of opening the rich ecological data stored in natural history collections to the research public. This information is rooted in what may seem an unlikely location—taxonomic names and their historical usages, which are the keys for searching literature and extracting linked ecological data (Fig. 1). This has been the core motivation of our group, enabled by the pioneering efforts of Plazi (Agosti and Egloff 2009) to build tools for literature digitization, extraction, and parsing (e.g.,Synospecies, Ocellus) without which biodiversity science would be even less prepared. Our group led efforts to build an additional pipeline from Plazi to theBiodiversity Literature Repository at Zenodo, a free and unlimited data repository (Agosti et al. 2019), and then to GloBI, an open-source database of biotic interactions(Poelen et al. 2014, GloBI 2020). We also developed a direct integration from Pensoft Journals to GloBI, leveraging that publisher’s indexing of computer-readable terms (called semantic metadata; Senderov et al. 2018) to extract mammal host and virus information.
Overall, considerable progress was made. In total, 85,492 new interactions were added to GloBI from 14 April to 21 May 2020 (see entire dataset on Zenodo: Poelen et al. 2020). Of those, 28,839 interactions are present when subset to "hasHost", "hostOf", "pathogenOf", "virus", and 4,101 unique name combinations are present after considering mammal species synonymies (from Meyer et al. 2015). Of those interactions, 892 species of mammals and 1,530 unique virus names are involved, which compares to 754 mammals and 586 viruses in the most recent data synthesis (Olival et al. 2017). While these liberated data may still include redundancies, they demonstrate the value of our approach and the expanse of known but digitally unconnected data that remains locked in publications.
We can liberate host-virus data from publications, but doing so is expensive and does not scale to the continued influx of new articles that are inadequately digitized. Our efforts make it clear that Pensoft-style semantic publishing should be expanded to all major journals. The pandemic has created an opportunity for re-thinking the way we do science in the digital age. Thankfully, our future is not the past, so we do not have to keep wasting resources to digitially 'rediscover' biodiversity knowledge. We collectively call for changes to the publishing paradigm, so that research findings are directly accessible, citable, discoverable, and reusable for creating complete forms of digital knowledge.
zoonotic disease riskspillovervirusmammalbattaxonomysemantic publishing2020TDWG 2020 annual conferenceTDWG 2020A Virtual ConferenceTDWG 2020 will be a virtual conference divided into working sessions (Sep 21-25) followed by a second week dedicated to dissemination and sharing (Oct 19-23).Presenting author
Nathan S Upham
Presented at
TDWG 2020
ReferencesAgostiDonatEgloffWilli2009Taxonomic information exchange and copyright: the Plazi approach21http://bmcresnotes.biomedcentral.com/articles/10.1186/1756-0500-2-5310.1186/1756-0500-2-53AgostiDonatCatapanoTerrySautterGuidoKishorPuneetNielsenLarsIoannidis-PantopikosAlexandrosBigarellaChiaraGeorgievTeodorGeorgievTeodorPenevLyubomirEgloffWilli2019Biodiversity Literature Repository (BLR), a repository for FAIR data and publications3https://zenodo.org/record/3257816#.XqzzfJopC7M10.3897/biss.3.37197BoniMaciej F.LemeyPhilippeJiangXiaoweiLamTommy Tsan-YukPerryBlair W.CastoeTodd A.RambautAndrewRobertsonDavid L.2020Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic110https://www.nature.com/articles/s41564-020-0771-410.1038/s41564-020-0771-4GloBI2020Plazi-Zenodo-GloBI integrationhttps://www.globalbioticinteractions.org/plazi-zenodo/MeyerCarstenKreftHolgerGuralnickRobertJetzWalter2015Global priorities for an effective information basis of biodiversity distributions6https://www.nature.com/articles/ncomms922110.1038/ncomms9221OlivalKevin J.HosseiniParviez R.Zambrana-TorrelioCarlosRossNoamBogichTiffany L.DaszakPeter2017Host and viral traits predict zoonotic spillover from mammals5467660646650https://www.nature.com/articles/nature2297510.1038/nature22975PoelenJorrit H.SimonsJames D.MungallChris J.2014Global biotic interactions: An open infrastructure to share and analyze species-interaction datasets24148159http://www.sciencedirect.com/science/article/pii/S157495411400112510.1016/j.ecoinf.2014.08.005PoelenJorritUphamNathanAgostiDonatRuschelTatianaGuidotiMarcusReederDeeAnnSimmonsNancyPenevLyubomirDimitrovaMariyaCsorbaGaborGroomQuentinWilloughbyAnna2020CETAF-DiSCCo/COVID19-TAF biodiversity-related knowledge hub working group: indexed biotic interactions and review summaryZenodoDOI: 10.5281/zenodo.3839098
type: datasethttps://zenodo.org/record/3839098#.XtD-C8YpBTYSenderovViktorSimovKirilFranzNicoStoevPavelCatapanoTerryAgostiDonatSautterGuidoMorrisRobert A.PenevLyubomir2018OpenBiodiv-O: ontology of the OpenBiodiv knowledge management system9110.1186/s13326-017-0174-5F8D0F4CB-46EB-5941-A3E5-A39B6E3D1708
Taxonomic names and their usages are the key for unlocking host-virus interaction data. Flow of information from digitizing taxonomic treatments containing species names and their historical usages (Plazi and Zenodo), to searching biodiversity literature for data linked to names, to connecting those biotic interactions in an ecological network (GloBI). Data can also flow directly into GloBI from Pensoft-style journals that publish data with computer-readable tags.