Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: Yasin Bakış (yasinbakis@gmail.com)
Received: 04 Sep 2023 | Published: 07 Sep 2023
© 2023 Yasin Bakış, Xiaojun Wang, Bahadır Altıntaş, Dom Jebbia, Henry Bart Jr.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Bakış Y, Wang X, Altıntaş B, Jebbia D, Bart Jr. HL (2023) On Image Quality Metadata, FAIR in ML, AI-Readiness and Reproducibility: Fish-AIR example. Biodiversity Information Science and Standards 7: e112178. https://doi.org/10.3897/biss.7.112178
|
A new science discipline has emerged within the last decade at the intersection of informatics, computer science and biology: Imageomics. Like most other -omics fields, Imageomics also uses emerging technologies to analyze biological data but from the images. One of the most applied data analysis methods for image datasets is Machine Learning (ML). In 2019, we started working on a United States National Science Foundation (NSF) funded project, known as Biology Guided Neural Networks (BGNN) with the purpose of extracting information about biology by using neural networks and biological guidance such as species descriptions, identifications, phylogenetic trees and morphological annotations (
Additional flexibility, built into the database infrastructure using an RDF framework, will enable the system to host different taxonomic groups, which might require new metadata features (
Fish-AIR provides an easy-to-access, filtered, annotated and cleaned biological dataset for researchers from different backgrounds and facilitates the integration of biological knowledge based on digitized preserved specimens into ML pipelines. Because of the flexible database infrastructure and addition of new datasets, researchers will also be able to access additional types of data—such as landmarks, specimen outlines, annotated parts, and quality scores—in the near future. Already, the dataset is the largest and most detailed AI-ready fish image dataset with integrated Image Quality Management System (
biodiversity informatics, data management, machine learning, artificial intelligence, data wrangling
Yasin Bakış
TDWG 2023
- NSF Harnessing the Data Revolution Institute Grant #2118240 (Imageomics)
- NSF Office of Advanced Cyberinfrastructure Grant #1940322 (BGNN)