Evaluation of deep learning models using explainable AI with qualitative and quantitative analysis for rice leaf disease detection.

Hari Kishan Kondaveeti , Chinna Gopi Simhadri

Sci Rep

Department of Computer Science and Engineering, Vignan's Foundation for Science, Technology, and Research, Guntur, 522213, Andhra Pradesh, India.

Published: August 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Deep learning models have shown remarkable success in disease detection and classification tasks, but lack transparency in their decision-making process, creating reliability and trust issues. Although traditional evaluation methods focus entirely on performance metrics such as classification accuracy, precision and recall, they fail to assess whether the models are considering relevant features for decision-making. The main objective of this work is to develop and validate a comprehensive three-stage methodology that combines conventional performance evaluation with qualitative and quantitative evaluation of explainable artificial intelligence (XAI) visualizations to assess both the accuracy and reliability of deep learning models. Eight pre-trained deep learning models - ResNet50, InceptionResNetV2, DenseNet 201, InceptionV3, EfficientNetB0, Xception, VGG16 and AlexNet,were evaluated using a three-stage methodology. First, the models are assessed using traditional classification metrics. Second, Local Interpretable Model-agnostic Explanations (LIME) is employed to visualize and quantitatively evaluate feature selection using metrics such as Intersection over Union (IoU) and the Dice Similarity Coefficient (DSC). Third, a novel overfitting ratio metric is introduced to quantify the reliance of the models on insignificant features. In the experimental analysis, ResNet50 emerged as the most accurate model, achieving 99.13% classification accuracy as well as the most reliable model demonstrating superior feature selection capabilities (IoU: 0.432, overfitting ratio: 0.284). Despite the high classification accuracies, models such as InceptionV3 and EfficientNetB0 showed poor feature selection capabilities with low IoU scores (0.295 and 0.326) and high overfitting ratios (0.544 and 0.458), indicating potential reliability issues in real-world applications. This study introduces a novel quantitative methodology for evaluating deep learning models that goes beyond traditional accuracy metrics, enabling more reliable and trustworthy AI systems for agricultural applications. This methodology is generic and researchers can explore the possibilities of extending it to other domains that require transparent and interpretable AI systems.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12397440	PMC
http://dx.doi.org/10.1038/s41598-025-14306-3	DOI Listing

Publication Analysis

Top Keywords

deep learning

learning models

feature selection

models

qualitative quantitative

disease detection

classification accuracy

three-stage methodology

inceptionv3 efficientnetb0

overfitting ratio

A PHP Error was encountered