63urn:lsid:arphahub.com:pub:0E0032F4-55AE-5263-8B3C-F4DD637C30C2Biodiversity Information Science and StandardsBISS2535-0897Pensoft Publishers10.3897/tdwgproceedings.1.20232202327631Conference AbstractSymposium: Using Big Data Techniques to Cross Dataset Boundaries - Integration and Analysis of Multiple DatasetsTAXREF-LD: A Reference Thesaurus for Biodiversity on the Web of Linked DataMichelFranckfranck.michel@cnrs.frhttps://orcid.org/0000-0001-9064-04631Faron-ZuckerCatherine1TercerieSandrine2OlivierGargominy2Université Côte d'Azur, Inria, CNRS, I3S, Sophia Antipolis, FranceUniversité Côte d'Azur, Inria, CNRS, I3SSophia AntipolisFranceMuséum national d'Histoire naturelle, Paris, FranceMuséum national d'Histoire naturelleParisFrance
Corresponding author: Franck Michel (franck.michel@cnrs.fr).
Academic editor:
2017140820171e20232DCE38520-35F9-57FD-BFC5-142578D8ACA0114049412082017Franck Michel, Catherine Faron-Zucker, Sandrine Tercerie, Gargominy OlivierThis is an open access article distributed under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Started in the early 2000’s, the Web of Data has now become a reality [Bizer 2009]. It keeps on growing through the relentless publication and interlinking of data sets spanning various domains of knowledge. Building upon the Resource Description Framework (RDF), this new layer of the Web implements the Linked Data paradigm [Heath and Bizer 2011] to connect and share pieces of data from disparate data sets. Thereby, it enables the integration of distributed and heterogeneous data sets, spawning an unprecedented worldwide knowledge base.
Taxonomic registers are key tools to help us comprehend the diversity of nature. They are the backbone for integrating independent data sources, and help figure out strategies regarding biodiversity and natural heritage conservation. As such, they naturally stand out as potential contributors to the Web of Data. Several international initiatives on taxonomic thesaurus such as NCBI Organismal Classification [Federhen 2012], AGROVOC Multilingual agricultural thesaurus [Caracciolo et al. 2013] or Encyclopedia of Life [Blaustein 2009] have already made this move towards the Web of Data.
In this talk, we will present an on-going work related to TAXREF [Gargominy et al. 2016], the taxonomic register for fauna, flora and fungus, maintained and distributed by the National Museum of Natural History of Paris (France). TAXREF registers all species inventoried in metropolitan France and overseas territories, in a controlled hierarchy of over 500.000 scientific names. Our goal is to publish TAXREF on the Web of Data, denoted TAXREF-LD, while adhering to standards and best practices for the publication of Linked Open Data (LOD) [Farias Lóscio et al. 2017].
The publication of TAXREF-LD as LOD required tackling several challenges. Far beyond a sheer automatic translation of the TAXREF database into LOD standards, the key point of the reported endeavor was the design of a model able to account for the two coexisting yet distinct realities underlying TAXREF, namely the nomenclature and the taxonomy. At the nomenclatural level, each scientific name is represented by a concept, expressed in the Simple Knowledge Organization System (SKOS) vocabulary [Miles and Bechhofer 2009], along with an authority and a taxonomic rank. At the taxonomic level, a species is represented by a class in the Web Ontology Language (OWL) [Schneider et al. 2012] whose properties are the species traits (habitat, biogeographical status, conservation status...). Both levels are connected by the links between a species and associated names (the valid name and existing synonyms). Note that the modelling applies not only to species but also to any other taxonomic rank (genus, family, etc.).
This model has several key advantages. First, it is relevant to biologists as well as computer scientists. Indeed, it agrees with three centuries of thinking on nomenclatural codes [Ride et al. 1999, McNeill et al. 2012] while, at the same time, it fits in with the philosophy underpinning SKOS and OWL: the nomenclatural level allows circulating through a hierarchy of concepts representing scientific names, and at the taxonomic level, the OWL classes represent the sets of individuals sharing common traits. Second, the model enables drawing links with other data sources published on the Web of Data, that may represent either nomenclatural or taxonomic information. Third, the taxonomy evolves frequently along with newly discovered species and changes in the scientific consensus. Typically, a name may alternatively be considered as the valid name of a species or a synonym. The distinction between the nomenclatural and taxonomic levels, alongside an appropriate Uniform Resource Identifier (URI) naming scheme for names and taxa, makes the model flexible enough to accommodate such changes.
Furthermore, our goal in this talk is not only to present the work achieved, but more importantly to engage in a discussion with the stakeholders of the community, may they be data consumers or producers of sibling classifications concerned with the publication of LOD, about data integration scenarios that may arise from the availability of such a large, distributed, knowledge database.
Linked DataTaxonomyData Integration1-6 October 2017TDWG 2017 Annual ConferenceTDWG 2017Ottawa, CanadaData Integration in a Big Data Universe: Associating Occurrences with Genes, Phenotypes, and EnvironmentsPresenting author
Franck MICHEL is a research engineer at the University Cote d'Azur, France. His research topics notably concern the integration and federation of heterogeneous data sources using Semantic Web ontologies, and their publication in the Web of Data.
Olivier GARGOMINY is a research engineer at the National Museum of Natural history in France. He is responsible for the French national taxonomic register for fauna, flora and fungus (named TAXREF) and the knowledge database associated with this taxonomic register (status, biological interactions, etc).
ReferencesBizerChristian2009The Emerging Web of Linked Data2458792BlausteinRichard2009The Encyclopedia of Life: Describing Species, Unifying Biology597551556CaraccioloCaterinaStellatoArmandoMorshedAhsanJohannsenGudrunRajbhandariSachitJaquesYvesKeizerJohannes2013The AGROVOC linked dataset43341348Farias LóscioBernadetteBurleCarolineCalegariNewton2017Data on the Web Best PracticesFederhenS.2012The NCBI Taxonomy database40D1D136-D143D136-D143GargominyO.TercerieS.RégnierC.RamageT.SchoelinkC.DupontP.VandelE.DaszkiewiczP.PoncetL.2016TAXREF v10. 0, référentiel taxonomique pour la France: méthodologie, mise en œuvre et diffusion.HeathTomBizerChristian20111stMorgan & ClaypoolMcNeillJBarrieFRBuckWRDemoulinVGreuterWHawksworthDLHerendeenPSKnappSMarholdKPradoJPrud'homme Van ReineWFSmithGFWiersemaJHTurlandNJ2012International Association for Plant Taxonomy.MilesAlistairBechhoferSean2009SKOS Simple Knowledge Organization System Namespace DocumentRideW. D.L.CoggerH. G.DupuisC.KrausO.MinelliA.ThompsonF. C.TubbsP. K.1999International Trust for Zoological NomenclatureSchneiderMichaelCarrollJeremyHermanIvanPatel-SchneiderPeter F.2012OWL 2 Web Ontology Language RDF-Based Semantics (Second Edition)