Biodiversity Information Science and Standards : Conference Abstract
Print
Conference Abstract
Best practices for connecting genetic records with specimen data
expand article info Michael Trizna
‡ Smithsonian Institution, Washington, DC, United States of America
Open Access

Abstract

As rapid advances in sequencing technology result in more branches of the tree of life being illuminated, there has actually been a decrease in the percentage of sequence records that are backed by voucher specimens Trizna 2018bThe good news is that there are tools Trizna (2017), NCBI (2005), Biocode LLC (2014) to enable well-databased museum vouchers to automatically validate and format specimen and collection metadata for high quality sequence records. Another problem is that there are millions of existing sequence records that are known to contain either incorrect or incomplete specimen data. I will show an end-to-end example of sequencing specimens from a museum, depositing their sequence records in NCBI's (National Center for Biotechnology Information) GenBank database, and then providing updates to GenBank as the museum database revises identifications. I will also talk about linking records from specimen databases as well. Over one million records in the Global Biodiversity Information Facility (GBIF) Trizna (2018a) contain a value in the Darwin Core term "associatedSequences", and I will examine what is currently contained in these entries, and how best to format them to ensure that a tight connection is made to sequence records.

Keywords

collections, GenBank, museum vouchers, biodiversity informatics, software, data linking, Darwin Core

Presenting author

Michael Trizna

References

login to comment