Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: Bahadir Altintas (baltintas@tulane.edu)
Received: 08 Sep 2023 | Published: 11 Sep 2023
This is an open access article distributed under the terms of the CC0 Public Domain Dedication.
Citation:
Altintas B, Bakış Y, Wang X, Bart H (2023) Application of AI-Helped Image Classification of Fish Images: An iDigBio dataset example. Biodiversity Information Science and Standards 7: e112438. https://doi.org/10.3897/biss.7.112438
|
Artificial Intelligence (AI) becomes more prevalent in data science as well as in areas of computational science. Commonly used classification methods in AI can also be used for unorganized databases, if a proper model is trained. Most of the classification work is done on image data for purposes such as object detection and face recognition. If an object is well detected from an image, the classification may be done to organize image data. In this work, we try to identify images from an Integrated Digitized Biocollections (iDigBio) dataset and to classify these images to generate metadata to use as an AI-ready dataset in the future. The main problem of the museum image datasets is the lack of metadata information on images, wrong categorization, or poor image quality. By using AI, it maybe possible to overcome these problems. Automatic tools can help find, eliminate or fix these problems. For our example, we trained a model for 10 classes (e.g., complete fish, photograph, notes/labels, X-ray, CT (computerized tomotography) scan, partial fish, fossil, skeleton) by using a manually tagged iDigBio image dataset. After training a model for each for class, we reclassified the dataset by using these trained models. Some of the results are given in Table
Percentage of misclassified samples by models, e.g., 26.08% of the images classified as "drawing" were found as "xray" and 53% were found as "complete fish".
classes | xray | cleared stained | complete fish | photograph | fossil | labels notes | skeleton | C-T scan | partial fish | drawing |
xray | 71.63 | 0.17 | 0.54 | 1.74 | 0.97 | 19.17 | 4.35 | 94.30 | 2.00 | 26.08 |
cleared_stained | 0.06 | 2.24 | 4.59 | 0.13 | 0.10 | 0.12 | 0.06 | 0.00 | 0.80 | 1.48 |
complete_fish | 7.90 | 95.54 | 66.89 | 69.50 | 3.96 | 19.05 | 63.81 | 1.10 | 82.24 | 53.02 |
photograph | 0.34 | 0.06 | 1.83 | 8.30 | 1.07 | 0.36 | 1.60 | 0.03 | 2.31 | 0.97 |
fossil | 1.25 | 0.01 | 6.63 | 1.51 | 44.77 | 1.31 | 10.40 | 0.19 | 1.85 | 0.57 |
labels_notes | 2.39 | 0.04 | 0.55 | 1.24 | 17.33 | 43.10 | 0.55 | 0.11 | 1.02 | 4.04 |
skeleton | 0.17 | 0.01 | 0.85 | 0.80 | 4.88 | 0.24 | 9.30 | 0.03 | 0.71 | 0.46 |
C-T_scan | 8.87 | 0.03 | 0.02 | 0.00 | 0.00 | 0.36 | 0.22 | 2.77 | 0.10 | 0.23 |
partial_fish | 1.93 | 0.82 | 9.90 | 10.41 | 1.02 | 0.60 | 3.63 | 0.30 | 2.44 | 1.31 |
drawing | 2.50 | 0.73 | 1.45 | 0.33 | 0.25 | 8.10 | 0.28 | 0.58 | 0.28 | 8.31 |
As can be seen in the table, even manually classified images can be identified as different classes, and some classes are very similar to each other visually such as CT scans and X-rays or fossils and skeletons. Those kind of similarities are very confusing for the human eye as well as AI results.
image metadata, image processing, AI readiness, data preparation workflow
Bahadir Altintas
TDWG 2023
This research is supported by: National Science Foundation's Harnessing the Data Revolution Institute Grant #2118240 (Imageomics) Tulane University subaward from The Ohio State University.
Tulane University