Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians.

Eric J Robinson , Chunyuan Qiu , Stuart Sands , Mohammad Khan , Shivang Vora , Kenichiro Oshima , Khang Nguyen , L Andrew DiFronzo , David Rhew , Mark I Feng

World J Urol

Department of Urology, Baldwin Park Medical Center, Kaiser Permanente, 1011 Baldwin Park Blvd., Baldwin Park, CA, 91706, USA.

Published: December 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Purpose: To evaluate the accuracy, comprehensiveness, empathetic tone, and patient preference for AI and urologist responses to patient messages concerning common BPH questions across phases of care.

Methods: Cross-sectional study evaluating responses to 20 BPH-related questions generated by 2 AI chatbots and 4 urologists in a simulated clinical messaging environment without direct patient interaction. Accuracy, completeness, and empathetic tone of responses assessed by experts using Likert scales, and preferences and perceptions of authorship (chatbot vs. human) rated by non-medical evaluators.

Results: Five non-medical volunteers independently evaluated, ranked, and inferred the source for 120 responses (n = 600 total). For volunteer evaluations, the mean (SD) score of chatbots, 3.0 (1.4) (moderately empathetic) was significantly higher than urologists, 2.1 (1.1) (slightly empathetic) (p < 0.001); mean (SD) and preference ranking for chatbots, 2.6 (1.6), was significantly higher than urologist ranking, 3.9 (1.6) (p < 0.001). Two subject matter experts (SMEs) independently evaluated 120 responses each (answers to 20 questions from 4 urologist and 2 chatbots, n = 240 total). For SME evaluations, mean (SD) accuracy score for chatbots was 4.5 (1.1) (nearly all correct) and not significantly different than urologists, 4.6 (1.2). The mean (SD) completeness score for chatbots was 2.4 (0.8) (comprehensive), significantly higher than urologists, 1.6 (0.6) (adequate) (p < 0.001).

Conclusion: Answers to patient BPH messages generated by chatbots were evaluated by experts as equally accurate and more complete than urologist answers. Non-medical volunteers preferred chatbot-generated messages and considered them more empathetic compared to answers generated by urologists.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11680670	PMC
http://dx.doi.org/10.1007/s00345-024-05399-y	DOI Listing

Publication Analysis

Top Keywords

accuracy completeness

empathetic tone

physician ai-generated

ai-generated messages

messages urology

urology evaluation

evaluation accuracy

completeness preference

preference patients

patients physicians

Similar Publications

A Risk Prediction Tool for Invasive Melanoma.

JAMA Dermatol

September 2025

Department of Population Health, QIMR Berghofer Medical Research Institute, Herston, Queensland, Australia.

David C Whiteman , Catherine M Olsen , Huanwei Wang , Matthew H Law , Rachel E Neale

Importance: Increasingly, strategies to systematically detect melanomas invoke targeted approaches, whereby those at highest risk are prioritized for skin screening. Many tools exist to predict future melanoma risk, but most have limited accuracy and are potentially biased.

Objectives: To develop an improved melanoma risk prediction tool for invasive melanoma.

View Article and Find Full Text PDF

Similar Publications

The impact of surgeons' visual angle misperception on acetabular cup positioning accuracy: a retrospective multicentre cohort study.

Int J Surg

September 2025

Shanghai Key Laboratory of Orthopaedic Implants, Department of Orthopaedic Surgery, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, People's Republic of China.

Yuehao Hu , Minghao Jin , Yansong Qi , Yuanqing Mao , Zhenan Zhu

Background: Precise acetabular cup placement in total hip arthroplasty (THA) heavily relies on surgeons' visual judgment of angles. However, whether inherent visual angle misperception among surgeons affects surgical outcomes remains unclear. This study is the first to reveal that surgeons universally exhibit visual angle misperception, a key factor causing the cup implant positioning deviations in THA.

View Article and Find Full Text PDF

Similar Publications

Student Perceptions of a Custom Artificial Intelligence Clinical Case Companion.

J Physician Assist Educ

September 2025

Andrew P. Chastain, DMS, PA-C, is an assistant professor at Butler University, Indianapolis, Indiana.

Andrew P Chastain , Chris Roman , Kevin M Bogenschutz

Introduction: Artificial intelligence tools show promise in supplementing traditional physician assistant education, particularly in developing clinical reasoning skills. However, limited research exists on custom Generative Pretrained Transformer (GPT) applications in physician assistant (PA) education. This study evaluated student experiences and perceptions of a custom GPT-based clinical reasoning tool.

View Article and Find Full Text PDF

Similar Publications

High-resolution imaging system for integration into intelligent noncontact total body scanner.

J Biomed Opt

September 2025

Leibniz University Hannover, Hannover Centre for Optical Technologies, Hannover, Germany.

Lennart Jütte , Sandra González-Villà , Josep Quintana , Rafael Garcia , Bernhard Roth

Significance: Melanoma's rising incidence demands automatable high-throughput approaches for early detection such as total body scanners, integrated with computer-aided diagnosis. High-quality input data is necessary to improve diagnostic accuracy and reliability.

Aim: This work aims to develop a high-resolution optical skin imaging module and the software for acquiring and processing raw image data into high-resolution dermoscopic images using a focus stacking approach.

View Article and Find Full Text PDF

Similar Publications

Protocol to produce a systematic Arenavirus and Hantavirus host-pathogen database: Project ArHa.

Wellcome Open Res

August 2025

Paul G. Allen School for Global Health, Washington State University, Pullman, Washington, USA.

David Simons , Ricardo Rivero , Ana Martinez-Checa Guiote , Harry Luke Mackenzie Gordon , Gregory C Milne

Arenaviruses and Hantaviruses, primarily hosted by rodents and shrews, represent significant public health threats due to their potential for zoonotic spillover into human populations. Despite their global distribution, the full impact of these viruses on human health remains poorly understood, particularly in regions like Africa, where data is sparse. Both virus families continue to emerge, with pathogen evolution and spillover driven by anthropogenic factors such as land use change, climate change, and biodiversity loss.

View Article and Find Full Text PDF

Similar Publications