Severity: Warning
Message: file_get_contents(https://...@gmail.com&api_key=61f08fa0b96a73de8c900d749fcb997acc09&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests
Filename: helpers/my_audit_helper.php
Line Number: 197
Backtrace:
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 197
Function: file_get_contents
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 271
Function: simplexml_load_file_from_url
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3165
Function: getPubMedXML
File: /var/www/html/application/controllers/Detail.php
Line: 597
Function: pubMedSearch_Global
File: /var/www/html/application/controllers/Detail.php
Line: 511
Function: pubMedGetRelatedKeyword
File: /var/www/html/index.php
Line: 317
Function: require_once
98%
921
2 minutes
20
Deep learning models have shown remarkable success in disease detection and classification tasks, but lack transparency in their decision-making process, creating reliability and trust issues. Although traditional evaluation methods focus entirely on performance metrics such as classification accuracy, precision and recall, they fail to assess whether the models are considering relevant features for decision-making. The main objective of this work is to develop and validate a comprehensive three-stage methodology that combines conventional performance evaluation with qualitative and quantitative evaluation of explainable artificial intelligence (XAI) visualizations to assess both the accuracy and reliability of deep learning models. Eight pre-trained deep learning models - ResNet50, InceptionResNetV2, DenseNet 201, InceptionV3, EfficientNetB0, Xception, VGG16 and AlexNet,were evaluated using a three-stage methodology. First, the models are assessed using traditional classification metrics. Second, Local Interpretable Model-agnostic Explanations (LIME) is employed to visualize and quantitatively evaluate feature selection using metrics such as Intersection over Union (IoU) and the Dice Similarity Coefficient (DSC). Third, a novel overfitting ratio metric is introduced to quantify the reliance of the models on insignificant features. In the experimental analysis, ResNet50 emerged as the most accurate model, achieving 99.13% classification accuracy as well as the most reliable model demonstrating superior feature selection capabilities (IoU: 0.432, overfitting ratio: 0.284). Despite the high classification accuracies, models such as InceptionV3 and EfficientNetB0 showed poor feature selection capabilities with low IoU scores (0.295 and 0.326) and high overfitting ratios (0.544 and 0.458), indicating potential reliability issues in real-world applications. This study introduces a novel quantitative methodology for evaluating deep learning models that goes beyond traditional accuracy metrics, enabling more reliable and trustworthy AI systems for agricultural applications. This methodology is generic and researchers can explore the possibilities of extending it to other domains that require transparent and interpretable AI systems.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12397440 | PMC |
http://dx.doi.org/10.1038/s41598-025-14306-3 | DOI Listing |