Biodiversity Information Science and Standards : Conference Abstract
PDF
Conference Abstract
Bridging a Gap in Metabarcoding Research: The ASV Table Registry
expand article infoChristian Bräunig, Björn Quast, Peter Grobe
‡ Leibniz Institute for the Analysis of Biodiversity Change, Zoological Research Museum Koenig, Bonn, Germany
Open Access

Abstract

Metabarcoding is a tool to routinely identify species in environmental mass-samples and thereby analyze their species composition. Using metabarcoding techniques outperforms the traditional species identification by human experts in amount, speed and quality when well curated reference data are available.

Therefore, metabarcoding can be seen as the future standard method for all biological research areas where species occurrence and distribution is in question, e.g., ecological research or monitoring projects (Porter and Hajibabaei 2018).

A common outcome of metabarcoding research are Amplicon Sequence Variant tables (ASV, Callahan et al. 2017). These tables combine the extracted sequences of all sampling plots with the occurrences of each sequence within a single plot. To identify the species, each sequence is searched in one or more reference databases that hold sequences and their known taxon identifications (e.g., Barcode Of Life Data system (BOLD) or the German Barcode of Life library (GBOL)). The sequence searches utilise tools like BLAST, BOLD identification engine, or vsearch. Found taxa and their taxonomy are added to the ASV tables as taxon assignments.

The number and precision of taxon assignments will increase with the growth of available sequences and quality of identifications in reference databases over time (Weigand et al. 2019). The introduction of new marker sequences and improvements in search tools will further enhance the taxon assignments. Thus, the taxon assignments in ASV tables are subject to change.

Projects with the aim of building up species inventories on a large scale (GBOL) or monitoring programs, like the Automated Multisensor Stations for Monitoring of BioDiversity (Wägele et al. 2022), quickly produce data sets with thousands of sequences at numerous locations.

Currently, most ASV tables are stored as supplements to publications or in private repositories. This makes analysis across multiple research projects difficult and error prone as sequences and their taxon assignments are often not accessible. Efforts, like the European Bioinformatics Institute metagenomics with Mgnify serve the needs for uploading and annotating environmental DNA samples (Mitchell et al. 2017), but a registry for ASV tables with complete data life cycles is lacking.

To fill this gap, we develop an ASV Table Registry as part of the German Barcode of Life III - Dark Taxa project. This allows users to:

  • register ASV tables and sequences

  • upload and manage ASV tables with versioning

  • publish ASV tables with DOIs

  • search by sequences, taxa, and occurrence data

  • retrieve API-based data

  • assign taxonomic names with various tools and reference databases

  • keep track of the applied search methods and parameters

The data life cycle of the uploaded ASV tables consists of several draft versions (each re-annotation with the identification pipeline creates a new draft version) and eventually a published version with a DOI. New draft versions can be created from the published version, then re-annotated and published again. The tracking of former taxon assignments allows researchers to re-evaluate data of former studies, compare them, and add new results. The ASV Table Registry developed here aims to make ASV tables FAIR (Findable, Accessible, Interoperable, and Reusable) and to foster the shared use in research projects.

Future development focuses on the incorporation of the MIxS standard (Yilmaz et al. 2011) and on submission of the published data to International Nucleotide Sequence Database Collaboration (INSDC) using established dataflows from the German Federation for Biological Data (GFBio) and NFDI4biodiversity.

The ASV data portal is accessible at: https://bolgermany.de/metabarcoding; the source code at: https://gitlab.leibniz-lib.de/GBOL/asv-table-registry.

Keywords

taxon annotation, environmental research, biodiversity monitoring, barcode reference databases, DNA barcodes, FAIR principles

Presenting author

Christian Bräunig

Presented at

TDWG 2022

Funding program

The ASV Table Registry is developed within the BMBF funded Project GBOL III - Dark Taxa, German Federal Ministry of Education and Research (BMBF, grant ID: 01LI1901)

References

login to comment