Testing the performance, adequacy, and applicability of an artificial intelligence model for pediatric pneumonia diagnosis.

Sara Domínguez-Rodríguez , Helena Liz-López , Angel Panizo-LLedot , Álvaro Ballesteros , Ron Dagan , David Greenberg , Lourdes Gutiérrez , Pablo Rojo , Enrique Otheo , Juan Carlos Galán , Sara Villanueva , Sonsoles García , Pablo Mosquera , Alfredo Tagarro , Cinta Moraleda , David Camacho

Comput Methods Programs Biomed

Computer Systems Engineering Department, Universidad Politécnica de Madrid, Spain.

Published: December 2023

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Background: Community-acquired Pneumonia (CAP) is a common childhood infectious disease. Deep learning models show promise in X-ray interpretation and diagnosis, but their validation should be extended due to limitations in the current validation workflow. To extend the standard validation workflow we propose doing a pilot test with the next characteristics. First, the assumption of perfect ground truth (100% sensitive and specific) is unrealistic, as high intra and inter-observer variability have been reported. To address this, we propose using Bayesian latent class models (BLCA) to estimate accuracy during the pilot. Additionally, assessing only the performance of a model without considering its applicability and acceptance by physicians is insufficient if we hope to integrate AI systems into day-to-day clinical practice. Therefore, we propose employing explainable artificial intelligence (XAI) methods during the pilot test to involve physicians and evaluate how well a Deep Learning model is accepted and how helpful it is for routine decisions as well as analyze its limitations by assessing the etiology. This study aims to apply the proposed pilot to test a deep Convolutional Neural Network (CNN)-based model for identifying consolidation in pediatric chest-X-ray (CXR) images already validated using the standard workflow.

Methods: For the standard validation workflow, a total of 5856 public CXRs and 950 private CXRs were used to train and validate the performance of the CNN model. The performance of the model was estimated assuming a perfect ground truth. For the pilot test proposed in this article, a total of 190 pediatric chest-X-ray (CXRs) images were used to test the CNN model support decision tool (SDT). The performance of the model on the pilot test was estimated using extensions of the two-test Bayesian Latent-Class model (BLCA). The sensitivity, specificity, and accuracy of the model were also assessed. The clinical characteristics of the patients were compared according to the model performance. The adequacy and applicability of the SDT was tested using XAI techniques. The adequacy of the SDT was assessed by asking two senior physicians the agreement rate with the SDT. The applicability was tested by asking three medical residents before and after using the SDT and the agreement between experts was calculated using the kappa index.

Results: The CRXs of the pilot test were labeled by the panel of experts into consolidation (124/176, 70.4%) and no-consolidation/other infiltrates (52/176, 29.5%). A total of 31/176 (17.6%) discrepancies were found between the model and the panel of experts with a kappa index of 0.6. The sensitivity and specificity reached a median of 90.9 (95% Credible Interval (CrI), 81.2-99.9) and 77.7 (95% CrI, 63.3-98.1), respectively. The senior physicians reported a high agreement rate (70%) with the system in identifying logical consolidation patterns. The three medical residents reached a higher agreement using SDT than alone with experts (0.66±0.1 vs. 0.75±0.2).

Conclusions: Through the pilot test, we have successfully verified that the deep learning model was underestimated when a perfect ground truth was considered. Furthermore, by conducting adequacy and applicability tests, we can ensure that the model is able to identify logical patterns within the CXRs and that augmenting clinicians with automated preliminary read assistants could accelerate their workflows and enhance accuracy in identifying consolidation in pediatric CXR images.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.cmpb.2023.107765	DOI Listing

Publication Analysis

Top Keywords

pilot test

model

adequacy applicability

deep learning

validation workflow

perfect ground

ground truth

performance model

performance adequacy

artificial intelligence

Similar Publications

Evaluating the quality, feasibility and patient satisfaction of medication history taking by telephone for patients with scheduled admissions: a pilot study.

Int J Clin Pharm

September 2025

Heidelberg University, Medical Faculty Heidelberg / Heidelberg University Hospital, Internal Medicine IX - Department of Clinical Pharmacology and Pharmacoepidemiology, Cooperation Unit Clinical Pharmacy, Im Neuenheimer Feld 410, 69120, Heidelberg, Germany.

Theresa Terstegen , Janina A Bittmann , Luise Kauk , Marietta Kirchner , Sebastian Krug

Introduction: Medication history taking at hospital admission is still prone to errors. Despite numerous quality improvement initiatives, new strategies to improve medication history taking are still sought and evaluated. Unfortunately, the gold standard research methodology for evaluation is resource-intensive, as it requires each patient to complete two medication history interviews.

View Article and Find Full Text PDF

Similar Publications

Feasibility of a Gamified Mobile-Based Self-Management Intervention for Individuals With Nonspecific Chronic Lower Back Pain.

Nurs Res

September 2025

College of Nursing & Institute of Nursing Research, Korea University, Seoul, South Korea.

Se Jin Hong , Soyeon Park , Namsu Kim , Minsuh Chung , Youlee Jung

Background: Existing research fails to address the complex nature of nonspecific chronic lower back pain (cLBP ) despite its detrimental effect on economic, societal, and medical expenditures.

Objectives: We developed a nurse-led, mobile-delivered self-management intervention-Problem-Solving Pain to Enhance Living Well (PROPEL-M)-and evaluated its usability, feasibility, and initial efficacy for South Korean adults with nonspecific cLBP.

Methods: This study was composed of two phases: (a) lab and field usability testing for a gamified mobile device application; and (b) a pilot study employing a one-arm pre-test and post-test design among adults aged 18-60 years with nonspecific cLBP.

View Article and Find Full Text PDF

Similar Publications

Pilot Study: Identification of the T Cell Tumor Microenvironment of Premalignant and Malignant Anal Lesions.

Dis Colon Rectum

September 2025

Department of Surgery, Oregon Health & Science University, Portland, Oregon.

Cynthia Araradian , Mariah Erlick , Shaun Goodyear , Adel Kardosh , Brian Mau

Background: Anal squamous cell cancer incidence has risen 2.2% each year over the past decade. Current screening includes anal cytology and high-resolution anoscopy but is burdened with sampling error and patient discomfort.

View Article and Find Full Text PDF

Similar Publications