Application of AI-Helped Image Classification of Fish Images: An iDigBio dataset example

Bahadir Altintas; Yasin Bakış; Xiojun Wang; Henry Bart

doi:10.3897/biss.7.112438

Biodiversity Information Science and Standards : Conference Abstract

PDF

Conference Abstract

Application of AI-Helped Image Classification of Fish Images: An iDigBio dataset example

Bahadir Altintas^‡,§, Yasin Bakış^‡, Xiojun Wang^‡, Henry Bart^‡

‡ Tulane University, New Orleans, United States of America

§ Bolu Abant Izzet Baysal University, Bolu, Turkiye

Corresponding author: Bahadir Altintas (baltintas@tulane.edu)

Received: 08 Sep 2023 | Published: 11 Sep 2023

This is an open access article distributed under the terms of the CC0 Public Domain Dedication.

Citation: Altintas B, Bakış Y, Wang X, Bart H (2023) Application of AI-Helped Image Classification of Fish Images: An iDigBio dataset example. Biodiversity Information Science and Standards 7: e112438. https://doi.org/10.3897/biss.7.112438

Abstract

Artificial Intelligence (AI) becomes more prevalent in data science as well as in areas of computational science. Commonly used classification methods in AI can also be used for unorganized databases, if a proper model is trained. Most of the classification work is done on image data for purposes such as object detection and face recognition. If an object is well detected from an image, the classification may be done to organize image data. In this work, we try to identify images from an Integrated Digitized Biocollections (iDigBio) dataset and to classify these images to generate metadata to use as an AI-ready dataset in the future. The main problem of the museum image datasets is the lack of metadata information on images, wrong categorization, or poor image quality. By using AI, it maybe possible to overcome these problems. Automatic tools can help find, eliminate or fix these problems. For our example, we trained a model for 10 classes (e.g., complete fish, photograph, notes/labels, X-ray, CT (computerized tomotography) scan, partial fish, fossil, skeleton) by using a manually tagged iDigBio image dataset. After training a model for each for class, we reclassified the dataset by using these trained models. Some of the results are given in Table 1.

Table 1.

Download as

CSV

XLSX

Percentage of misclassified samples by models, e.g., 26.08% of the images classified as "drawing" were found as "xray" and 53% were found as "complete fish".

classes	xray	cleared stained	complete fish	photograph	fossil	labels notes	skeleton	C-T scan	partial fish	drawing
xray	71.63	0.17	0.54	1.74	0.97	19.17	4.35	94.30	2.00	26.08
cleared_stained	0.06	2.24	4.59	0.13	0.10	0.12	0.06	0.00	0.80	1.48
complete_fish	7.90	95.54	66.89	69.50	3.96	19.05	63.81	1.10	82.24	53.02
photograph	0.34	0.06	1.83	8.30	1.07	0.36	1.60	0.03	2.31	0.97
fossil	1.25	0.01	6.63	1.51	44.77	1.31	10.40	0.19	1.85	0.57
labels_notes	2.39	0.04	0.55	1.24	17.33	43.10	0.55	0.11	1.02	4.04
skeleton	0.17	0.01	0.85	0.80	4.88	0.24	9.30	0.03	0.71	0.46
C-T_scan	8.87	0.03	0.02	0.00	0.00	0.36	0.22	2.77	0.10	0.23
partial_fish	1.93	0.82	9.90	10.41	1.02	0.60	3.63	0.30	2.44	1.31
drawing	2.50	0.73	1.45	0.33	0.25	8.10	0.28	0.58	0.28	8.31

As can be seen in the table, even manually classified images can be identified as different classes, and some classes are very similar to each other visually such as CT scans and X-rays or fossils and skeletons. Those kind of similarities are very confusing for the human eye as well as AI results.

Keywords

image metadata, image processing, AI readiness, data preparation workflow

Presenting author

Bahadir Altintas

Presented at

TDWG 2023

Acknowledgements

Funding program

This research is supported by: National Science Foundation's Harnessing the Data Revolution Institute Grant #2118240 (Imageomics) Tulane University subaward from The Ohio State University.

Hosting institution

Tulane University

Conflicts of interest

The authors have declared that no competing interests exist.

References

Supplementary material

Endnotes