Advancing Clinical Chatbot Validation Using AI-Powered Evaluation With a New 3-Bot Evaluation System: Instrument Validation Study.

JMIR Nurs

Department of Digital Health, Samsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Seoul, Republic of Korea.

Published: February 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: The health care sector faces a projected shortfall of 10 million workers by 2030. Artificial intelligence (AI) automation in areas such as patient education and initial therapy screening presents a strategic response to mitigate this shortage and reallocate medical staff to higher-priority tasks. However, current methods of evaluating early-stage health care AI chatbots are highly limited due to safety concerns and the amount of time and effort that goes into evaluating them.

Objective: This study introduces a novel 3-bot method for efficiently testing and validating early-stage AI health care provider chatbots. To extensively test AI provider chatbots without involving real patients or researchers, various AI patient bots and an evaluator bot were developed.

Methods: Provider bots interacted with AI patient bots embodying frustrated, anxious, or depressed personas. An evaluator bot reviewed interaction transcripts based on specific criteria. Human experts then reviewed each interaction transcript, and the evaluator bot's results were compared to human evaluation results to ensure accuracy.

Results: The patient-education bot's evaluations by the AI evaluator and the human evaluator were nearly identical, with minimal variance, limiting the opportunity for further analysis. The screening bot's evaluations also yielded similar results between the AI evaluator and human evaluator. Statistical analysis confirmed the reliability and accuracy of the AI evaluations.

Conclusions: The innovative evaluation method ensures a safe, adaptable, and effective means to test and refine early versions of health care provider chatbots without risking patient safety or investing excessive researcher time and effort. Our patient-education evaluator bots could have benefitted from larger evaluation criteria, as we had extremely similar results from the AI and human evaluators, which could have arisen because of the small number of evaluation criteria. We were limited in the amount of prompting we could input into each bot due to the practical consideration that response time increases with larger and larger prompts. In the future, using techniques such as retrieval augmented generation will allow the system to receive more information and become more specific and accurate in evaluating the chatbots. This evaluation method will allow for rapid testing and validation of health care chatbots to automate basic medical tasks, freeing providers to address more complex tasks.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11884306PMC
http://dx.doi.org/10.2196/63058DOI Listing

Publication Analysis

Top Keywords

health care
20
provider chatbots
12
early-stage health
8
care chatbots
8
time effort
8
care provider
8
patient bots
8
evaluator
8
evaluator bot
8
reviewed interaction
8

Similar Publications

Introduction: Medical physicists play a critical role in ensuring image quality and patient safety, but their routine evaluations are limited in scope and frequency compared to the breadth of clinical imaging practices. An electronic radiologist feedback system can augment medical physics oversight for quality improvement. This work presents a novel quality feedback system integrated into the Epic electronic medical record (EMR) at a university hospital system, designed to facilitate feedback from radiologists to medical physicists and technologist leaders.

View Article and Find Full Text PDF

Background: Survivors of critical illness frequently face physical, cognitive and psychological impairments after intensive care. Sensorimotor impairments potentially have a negative impact on participation. However, comprehensive understanding of sensorimotor recovery and participation in survivors of critical illness is limited.

View Article and Find Full Text PDF

Background: Recent advances in high-throughput sequencing technologies have enabled the collection and sharing of a massive amount of omics data, along with its associated metadata-descriptive information that contextualizes the data, including phenotypic traits and experimental design. Enhancing metadata availability is critical to ensure data reusability and reproducibility and to facilitate novel biomedical discoveries through effective data reuse. Yet, incomplete metadata accompanying public omics data may hinder reproducibility and reusability and limit secondary analyses.

View Article and Find Full Text PDF

This study investigated the impact of dietary zeolite supplementation on growth, cecal microbiota and digesta viscosity, digestive enzymes, carcass traits, blood constituents, and antioxidant parameters of broilers. A completely randomized design was used with 240 one-day-old broiler chicks randomly assigned to three dietary treatments (0%, 1.5%, and 3% zeolite as a feed additive) with four replicates of 20 chicks each.

View Article and Find Full Text PDF