Biodiversity Information Science and Standards : Conference Abstract
PDF
Conference Abstract
Application of AI-Helped Image Classification of Fish Images: An iDigBio dataset example
expand article infoBahadir Altintas‡,§, Yasin Bakış, Xiojun Wang, Henry Bart
‡ Tulane University, New Orleans, United States of America
§ Bolu Abant Izzet Baysal University, Bolu, Turkiye
Open Access

Abstract

Artificial Intelligence (AI) becomes more prevalent in data science as well as in areas of computational science. Commonly used classification methods in AI can also be used for unorganized databases, if a proper model is trained. Most of the classification work is done on image data for purposes such as object detection and face recognition. If an object is well detected from an image, the classification may be done to organize image data. In this work, we try to identify images from an Integrated Digitized Biocollections (iDigBio) dataset and to classify these images to generate metadata to use as an AI-ready dataset in the future. The main problem of the museum image datasets is the lack of metadata information on images, wrong categorization, or poor image quality. By using AI, it maybe possible to overcome these problems. Automatic tools can help find, eliminate or fix these problems. For our example, we trained a model for 10 classes (e.g., complete fish, photograph, notes/labels, X-ray, CT (computerized tomotography) scan, partial fish, fossil, skeleton) by using a manually tagged iDigBio image dataset. After training a model for each for class, we reclassified the dataset by using these trained models. Some of the results are given in Table 1.

Table 1.

Percentage of misclassified samples by models, e.g., 26.08% of the images classified as "drawing" were found as "xray" and 53% were found as "complete fish".

classes xray cleared stained complete fish photograph fossil labels notes skeleton C-T scan partial fish drawing
xray 71.63 0.17 0.54 1.74 0.97 19.17 4.35 94.30 2.00 26.08
cleared_stained 0.06 2.24 4.59 0.13 0.10 0.12 0.06 0.00 0.80 1.48
complete_fish 7.90 95.54 66.89 69.50 3.96 19.05 63.81 1.10 82.24 53.02
photograph 0.34 0.06 1.83 8.30 1.07 0.36 1.60 0.03 2.31 0.97
fossil 1.25 0.01 6.63 1.51 44.77 1.31 10.40 0.19 1.85 0.57
labels_notes 2.39 0.04 0.55 1.24 17.33 43.10 0.55 0.11 1.02 4.04
skeleton 0.17 0.01 0.85 0.80 4.88 0.24 9.30 0.03 0.71 0.46
C-T_scan 8.87 0.03 0.02 0.00 0.00 0.36 0.22 2.77 0.10 0.23
partial_fish 1.93 0.82 9.90 10.41 1.02 0.60 3.63 0.30 2.44 1.31
drawing 2.50 0.73 1.45 0.33 0.25 8.10 0.28 0.58 0.28 8.31

As can be seen in the table, even manually classified images can be identified as different classes, and some classes are very similar to each other visually such as CT scans and X-rays or fossils and skeletons. Those kind of similarities are very confusing for the human eye as well as AI results. 

Keywords

image metadata, image processing, AI readiness, data preparation workflow

Presenting author

Bahadir Altintas

Presented at

TDWG 2023

Funding program

This research is supported by: National Science Foundation's Harnessing the Data Revolution Institute Grant #2118240 (Imageomics) Tulane University subaward from The Ohio State University.

Hosting institution

Tulane University

Conflicts of interest

The authors have declared that no competing interests exist.
login to comment