Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: Joana Paupério (joanap@ebi.ac.uk)
Received: 27 Aug 2024 | Published: 28 Aug 2024
© 2024 Joana Paupério, Vikas Gupta, Vishnukumar Balavenkataraman Kadhirvelu, Kessy Abarenkov, Wouter Addink, Donat Agosti, Olaf Bánki, Josephine Burgin, Marcus Ernst, Tobias Frøslev, Quentin Groom, Anton Güntsch, Suran Jayathilaka, Sam Leeflang, Urmas Kõljalg, Joe Miller, Guido Sautter, Lyubomir Penev, Guy Cochrane
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Paupério J, Gupta V, Balavenkataraman Kadhirvelu V, Abarenkov K, Addink W, Agosti D, Bánki O, Burgin J, Ernst M, Frøslev T, Groom Q, Güntsch A, Jayathilaka S, Leeflang S, Kõljalg U, Miller J, Sautter G, Penev L, Cochrane G (2024) Linking Between Molecular and Biodiversity Data: A BiCIKL Perspective. Biodiversity Information Science and Standards 8: e135646. https://doi.org/10.3897/biss.8.135646
|
Molecular sequencing data generation is being driven by global and regional efforts to discover, understand and monitor biodiversity. To fully explore this data in biodiversity research we need a network of connected data resources, linking sequence data with natural history collections, taxonomy and literature. The BiCIKL project (Biodiversity Community Integrated Knowledge Library,
Connecting biodiversity and molecular data along the biodiversity research cycle requires a foundation of well-structured and rich metadata in the molecular sequence databases. Referencing the physical specimens is important as this provides context about the source of the material that was used for generating the molecular sequence data, including information about origin and species identification. To connect biodiversity and molecular data, we developed tools and workflows for improving and standardising metadata, federated searches and validations for specimen reference in sequence data, such as the SpASe tool, which enables the discovery of links between natural history collections and sequences, and the European Nucleotide Archive Source Attribute Helper API, which facilitates the construction of specimen attributes in a structured format. This work was done in close collaboration with DiSSCo (Distributed System of Scientific Collections) and some biodiversity genomics projects (e.g. Biodiversity Genomics Europe, BGE).
Furthermore, we enabled community curation of biological source annotations such as specimen references in sequence data through the PlutoF platform and the ELIXIR Contextual Data Clearinghouse (
Overall, the project has contributed significantly to strengthen the connections between the biodiversity and genomics communities towards higher data integration and interoperability. Structured, enriched, accessible and linked sequence data will provide a strong foundation for the application of biodiversity knowledge in the response to global challenges, such as biodiversity loss, ecosystem change and food security. Beyond BiCIKL, we will continue our work as a community to promote a culture of FAIR linked molecular data, towards a fully integrated biodiversity knowledge ecosystem.
sequence data, specimens, taxonomy, literature, FAIR, biodiversity community
Joana Paupério
SPNHC-TDWG 2024
The BiCIKL project received funding from the European Union's Horizon 2020 Research and Innovation Action under grant agreement No 101007492.