Biodiversity Information Science and Standards :
Conference Abstract
|
Corresponding author: José Alejandro Chavarría Madriz (jachm@estudiantec.cr), Maria Auxiliadora Mora-Cross (mariamoracross@gmail.com), William Ulate (wulate@crbio.org)
Received: 20 Sep 2023 | Published: 21 Sep 2023
© 2023 José Chavarría Madriz, Maria Mora-Cross, William Ulate
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Chavarría Madriz JA, Mora-Cross MA, Ulate W (2023) Comparative Study: Evaluating the effects of class balancing on transformer performance in the PlantNet-300k image dataset. Biodiversity Information Science and Standards 7: e113057. https://doi.org/10.3897/biss.7.113057
|
|
Image-based identification of plant specimens plays a crucial role in various fields such as agriculture, ecology, and biodiversity conservation. The growing interest in deep learning has led to remarkable advancements in image classification techniques, particularly with the utilization of convolutional neural networks (CNNs). Since 2015, in the context of the PlantCLEF (Conference and Labs of the Evaluation Forum) challenge (
In this study, we focus on the image classification task using the PlantNet-300k dataset (
In order to address the inherent challenges of the PlantNet-300k dataset, we employed a two-fold approach. Firstly, we leveraged transformer-based models to tackle the dataset's intrinsic ambiguity and effectively capture the complex visual patterns present in plant images. Secondly, we focused on mitigating the class imbalance issue through various data preprocessing techniques, specifically class balancing methods. By implementing these techniques, we aimed to ensure fair representation of all plant species in order to improve the overall performance of image classification models.
Our objective is to assess the effects of data preprocessing techniques, specifically class balancing, on the classification performance of the PlantNet-300k dataset. By exploring different preprocessing methods, we addressed the class imbalance issue and through precise evaluation, conducted a comparison of the performance of transformer-based models with and without class balancing techniques. Through these efforts, our ultimate goal is to assert if these techniques allow us to achieve more accurate and reliable classification results, particularly for underrepresented species in the dataset.
In our experiment, we compared the performance of two transformer-based models, ViT and CvT, using two versions of the PlantNet-300k dataset: one with class balancing and the other without class balancing. This setup results in a total of four sets of metrics for evaluation. To assess the classification performance, we utilized a wide range of commonly used metrics including recall, precision, accuracy, AUC (Area Under the Curve), ROC (Receiver Operating Characteristic), and others. These metrics provide insights into each models' ability to correctly classify plant species, identify false positives and negatives, measure overall accuracy, and assess the models' discriminatory power.
By conducting this comparative study, we seek to contribute to the advancement of plant identification research by providing empirical evidence of the benefits and effectiveness of class balancing techniques in improving the performance of transformer-based models on the PlantNet-300k dataset and any other similar ones.
deep learning, image-based identification, plant specimens, transformer-based models
José Alejandro Chavarría Madriz
TDWG 2023