Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: Vikas Gupta (vikasg@ebi.ac.uk)
Received: 01 Aug 2022 | Published: 02 Aug 2022
© 2022 Vikas Gupta, Joana Paupério, Josephine Burgin, Suran Jayathilaka, Guy Cochrane
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Gupta V, Paupério J, Burgin J, Jayathilaka S, Cochrane G (2022) The ENA Source Attribute Helper: An API for improved biological source data. Biodiversity Information Science and Standards 6: e91118. https://doi.org/10.3897/biss.6.91118
|
Metadata management for sequence data is essential for the accurate description of Earth’s biodiversity. Within metadata attributes, those that reference the biological sources of sequences and samples and allow linking to the specimen or sample of origin are fundamental for facilitating connections between molecular biology, taxonomy, systematic biology and biodiversity research, increasing the discoverability and usability of data by researchers worldwide.
Sequence data is publicly archived at the International Nucleotide Sequence Database Collaboration (INSDC) that includes the National Centre for Biotechnology Information (NCBI), the DNA Data Bank of Japan (DDBJ) and the European Nucleotide Archive (ENA). Sequences stored at INSDC have associated a considerable range of metadata, including attributes related to its biological source, such as references to natural history collections or culture collections. But, these source attributes are not always submitted or may be incomplete, limiting the association of the sequence records to the original source material, hampering further data connections (e.g., biological data associated with the voucher or species distribution data). Therefore, we have developed the ENA Source Attribute Helper API, a tool that aims to assist users on the submission of accurate attributes referring to the biological source of samples and sequence data. This tool was developed within the scope of BiCIKL (Biodiversity Community Integrated Knowledge Library) (
The first version of the tool was designed to support correct annotation of the attributes that identify the source material from which the sample or sequence were obtained, namely /specimen_voucher, /culture_collection, and /biomaterial (
Since the submission of the biological source attributes to the INSDC may be performed both when data is initially uploaded or on following updates using a variety of tools, we developed the API as an open source tool that is publicly accessible and may be used as a free-standing service. The API is built using Representational State Transfer (REST) API Architecture and it is designed to use the data available in the NCBI BioCollections (
The API is designed in a way that it can be extended easily for any future enhancements and initially expected to promote and support the submission and any subsequent curation of better structured and more richly described source data. We expect this tool to contribute to better connected biodiversity data and hence provide a stronger foundation to strengthen the value of natural history collections, taxonomic expertise, and biodiversity knowledge.
ENA submission tools, validation functions, specimen voucher, culture collection, bio material, NCBI biocollections
Vikas Gupta
TDWG 2022
BiCIKL project receives funding from the European Union's Horizon 2020 Research and Innovation Action under grant agreement No 101007492.
BiCIKL - Biodiversity Community Integrated Knowledge Library