Advancing Clinical Chatbot Validation Using AI-Powered Evaluation With a New 3-Bot Evaluation System: Instrument Validation Study.

Seungheon Choo , Suyoung Yoo , Kumiko Endo , Bao Truong , Meong Hi Son

JMIR Nurs

Department of Digital Health, Samsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Seoul, Republic of Korea.

Published: February 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Background: The health care sector faces a projected shortfall of 10 million workers by 2030. Artificial intelligence (AI) automation in areas such as patient education and initial therapy screening presents a strategic response to mitigate this shortage and reallocate medical staff to higher-priority tasks. However, current methods of evaluating early-stage health care AI chatbots are highly limited due to safety concerns and the amount of time and effort that goes into evaluating them.

Objective: This study introduces a novel 3-bot method for efficiently testing and validating early-stage AI health care provider chatbots. To extensively test AI provider chatbots without involving real patients or researchers, various AI patient bots and an evaluator bot were developed.

Methods: Provider bots interacted with AI patient bots embodying frustrated, anxious, or depressed personas. An evaluator bot reviewed interaction transcripts based on specific criteria. Human experts then reviewed each interaction transcript, and the evaluator bot's results were compared to human evaluation results to ensure accuracy.

Results: The patient-education bot's evaluations by the AI evaluator and the human evaluator were nearly identical, with minimal variance, limiting the opportunity for further analysis. The screening bot's evaluations also yielded similar results between the AI evaluator and human evaluator. Statistical analysis confirmed the reliability and accuracy of the AI evaluations.

Conclusions: The innovative evaluation method ensures a safe, adaptable, and effective means to test and refine early versions of health care provider chatbots without risking patient safety or investing excessive researcher time and effort. Our patient-education evaluator bots could have benefitted from larger evaluation criteria, as we had extremely similar results from the AI and human evaluators, which could have arisen because of the small number of evaluation criteria. We were limited in the amount of prompting we could input into each bot due to the practical consideration that response time increases with larger and larger prompts. In the future, using techniques such as retrieval augmented generation will allow the system to receive more information and become more specific and accurate in evaluating the chatbots. This evaluation method will allow for rapid testing and validation of health care chatbots to automate basic medical tasks, freeing providers to address more complex tasks.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11884306	PMC
http://dx.doi.org/10.2196/63058	DOI Listing

Publication Analysis

Top Keywords

health care

provider chatbots

early-stage health

care chatbots

time effort

care provider

patient bots

evaluator

evaluator bot

reviewed interaction

Similar Publications

Integrating medical physics into an EMR-based radiology feedback system for quality improvement.

J Appl Clin Med Phys

September 2025

Clinical Imaging Physics Group, Duke University Health System, Durham, North Carolina, USA.

Megan K Russ , Justin Solomon , Steve Bache , Nicole M Lafata , Erin B Macdonald

Introduction: Medical physicists play a critical role in ensuring image quality and patient safety, but their routine evaluations are limited in scope and frequency compared to the breadth of clinical imaging practices. An electronic radiologist feedback system can augment medical physics oversight for quality improvement. This work presents a novel quality feedback system integrated into the Epic electronic medical record (EMR) at a university hospital system, designed to facilitate feedback from radiologists to medical physicists and technologist leaders.

View Article and Find Full Text PDF

Similar Publications

Long-term recovery of sensorimotor functions and prediction of participation in survivors of critical illness: a prospective cohort study.

J Intensive Care

September 2025

German Center for Vertigo and Balance Disorders, Ludwig-Maximilians-Universitat (LMU), University Hospital Grosshadern, Munich, Germany.

Johanna Weghorn , Melanie Finsterhölzl , Franziska Wippenbeck , Klaus Jahn , Marion Egger

Background: Survivors of critical illness frequently face physical, cognitive and psychological impairments after intensive care. Sensorimotor impairments potentially have a negative impact on participation. However, comprehensive understanding of sensorimotor recovery and participation in survivors of critical illness is limited.

View Article and Find Full Text PDF

Similar Publications

The systematic assessment of completeness of public metadata accompanying omics studies in the Gene Expression Omnibus data repository.

Genome Biol

September 2025

Department of Clinical Pharmacy, Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, 90089, USA.

Yu-Ning Huang , Pooja Vinod Jaiswal , Anushka Rajes , Anushka Yadav , Dottie Yu

Background: Recent advances in high-throughput sequencing technologies have enabled the collection and sharing of a massive amount of omics data, along with its associated metadata-descriptive information that contextualizes the data, including phenotypic traits and experimental design. Enhancing metadata availability is critical to ensure data reusability and reproducibility and to facilitate novel biomedical discoveries through effective data reuse. Yet, incomplete metadata accompanying public omics data may hinder reproducibility and reusability and limit secondary analyses.

View Article and Find Full Text PDF

Similar Publications

Gut health and physiological aspects of broiler chicken fed zeolite as a dietary supplement: its effect on growth, cecal microbiota and digesta viscosity, digestive enzymes, carcass traits, blood constituents and antioxidant parameters.

BMC Vet Res

September 2025

Department of Poultry Production, Faculty of Agriculture, Fayoum University, Fayoum, 63514, Egypt.

Ibrahim A Abdel-Kader , Shaaban Saad Elnesr , Bothaina Y Mahmoud , Ensaf A El-Full , Ahmed M Emam

This study investigated the impact of dietary zeolite supplementation on growth, cecal microbiota and digesta viscosity, digestive enzymes, carcass traits, blood constituents, and antioxidant parameters of broilers. A completely randomized design was used with 240 one-day-old broiler chicks randomly assigned to three dietary treatments (0%, 1.5%, and 3% zeolite as a feed additive) with four replicates of 20 chicks each.

View Article and Find Full Text PDF

Similar Publications