Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: Vishnukumar Balavenkataraman Kadhirvelu (kadhirvelu@ebi.ac.uk)
Received: 17 Aug 2022 | Published: 23 Aug 2022
© 2022 Vishnukumar Balavenkataraman Kadhirvelu, Kessy Abarenkov, Allan Zirk, Joana Paupério, Guy Cochrane, Suran Jayathilaka, Olaf Bánki, Jerry Lanfear, Filipp Ivanov, Timo Piirmann, Raivo Pöhönen, Urmas Kõljalg
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Balavenkataraman Kadhirvelu V, Abarenkov K, Zirk A, Paupério J, Cochrane G, Jayathilaka S, Bánki O, Lanfear J, Ivanov F, Piirmann T, Pöhönen R, Kõljalg U (2022) Enabling Community Curation of Biological Source Annotations of Molecular Data Through PlutoF and the ELIXIR Contextual Data Clearinghouse. Biodiversity Information Science and Standards 6: e93595. https://doi.org/10.3897/biss.6.93595
|
|
The advancements in sequencing technologies have greatly contributed to the documentation of Earth’s biodiversity. However, for exploring the full potential of molecular resources for biodiversity, there needs to be a good linkage between sequence data and its biological source, contributing to a network of connected data in the biodiversity research cycle. This requires a foundation of well-structured and accessible annotations in the molecular sequence repositories.
The International Nucleotide Sequence Database Collaboration (INSDC), of which the European Nucleotide Archive (ENA) is its European node, holds a large amount of annotations associated with sequence data, relating to its biological source (e.g., specimens in natural history collections). However, for a number of records, these annotations may be incomplete (e.g., missing voucher information), ambiguous or even inaccurate.
Therefore, we have implemented a workflow that allows third-party annotations to be attached to sequence and sample records using two existing services, the PlutoF platform and the ELIXIR Contextual Data ClearingHouse. This work was developed within the scope of the BiCIKL (Biodiversity Community Integrated Knowledge Library) project, which aims to establish open science practices in the biodiversity domain.
PlutoF is an online data management platform that also provides computing services for biology-related research. PlutoF features allow registered users to enter their own data and access public data at INSDC. Users can enter and manage a range of data, as taxonomic classifications, occurrences, etc. This platform also includes a module that allows the addition of third-party annotations (on material source, taxonomic identification, etc.) linked to specimens or sequence records. This module was already in use by the UNITE community for annotation of INSDC rDNA Internal Transcribed Spacer sequence datasets (
The workflow developed is shown in Fig.
Workflow for third-party annotations added and verified in PlutoF and submitted to the ELIXIR Clearinghouse.
Overall, we expect this tool to contribute to the enrichment of metadata associated with sequence records, and therefore increase the links between the molecular and biodiversity resources, and enable sequencing data to deliver their full potential for biodiversity conservation.
third-party annotations, data management, linking data, BiCIKL
Vishnukumar Balavenkataraman Kadhirvelu
TDWG 2022
BiCIKL project receives funding from the European Union's Horizon 2020 Research and Innovation Action under grant agreement No 101007492. This work was also funded by ELIXIR, the research infrastructure for life-science data.
BiCIKL - Biodiversity Community Integrated Knowledge Library