Performance of ChatGPT-4o in the diagnostic workup of fever among returning travellers requiring hospitalization: a validation study.

Dana Yelin , Neta Shirin , Itai Harris , Yovel Peretz , Dafna Yahav , Eli Schwartz , Eyal Leshem , Ili Margalit

J Travel Med

Infectious Diseases Unit, Sheba Medical Center, Tel HaShomer, Ramat Gan, Israel.

Published: April 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Background: Febrile illness in returned travellers presents a diagnostic challenge in non-endemic settings. Chat generative pretrained transformer (ChatGPT) has the potential to assist in medical tasks, yet its diagnostic performance in clinical settings has rarely been evaluated. We conducted a validation assessment of ChatGPT-4o's performance in the workup of fever in returning travellers.

Methods: We retrieved the medical records of returning travellers hospitalized with fever during 2009-2024. Their clinical scenarios at time of presentation to the emergency department were prompted to ChatGPT-4o, using a detailed uniform format. The model was further prompted with four consistent questions concerning the differential diagnosis and recommended workup. To avoid training, we kept the model blinded to the final diagnosis. Our primary outcome was ChatGPT-4o's success rates in predicting the final diagnosis when requested to specify the top three differential diagnoses. Secondary outcomes were success rates when prompted to specify the single most likely diagnosis, and all necessary diagnostics. We also assessed ChatGPT-4o as a predicting tool for malaria and qualitatively evaluated its failures.

Results: ChatGPT-4o predicted the final diagnosis in 68% [95% confidence interval (CI) 59-77%], 78% (95% CI 69-85%) and 83% (95% CI 74-89%) of the 114 cases, when prompted to specify the most likely diagnosis, top three diagnoses and all possible diagnoses, respectively. ChatGPT-4o showed a sensitivity of 100% (95% CI 93-100%) and a specificity of 94% (95% CI 85-98%) for predicting malaria. The model failed to provide the final diagnosis in 18% (20/114) of cases, primarily by failing to predict globally endemic infections (16/21, 76%).

Conclusions: ChatGPT-4o demonstrated high diagnostic accuracy when prompted with real-life scenarios of febrile returning travellers presenting to the emergency department, especially for malaria. Model training is expected to yield an improved performance and facilitate diagnostic decision-making in the field.

Download full-text PDF	Source
http://dx.doi.org/10.1093/jtm/taaf005	DOI Listing

Publication Analysis

Top Keywords

final diagnosis

returning travellers

workup fever

fever returning

emergency department

success rates

top three

malaria model

diagnosis

diagnostic

Similar Publications

Broad-Spectrum Antibiotic Use at the End of Life in Patients With Advanced Cancer.

JAMA Netw Open

September 2025

Department of Internal Medicine, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, South Korea.

Jeong-Han Kim , Jiwon Yu , Shin Hye Yoo , Jin-Ah Sim , Bhumsuk Keam

Importance: Patients with advanced cancer frequently receive broad-spectrum antibiotics, but changing use patterns across the end-of-life trajectory remain poorly understood.

Objective: To describe the patterns of broad-spectrum antibiotic use across defined end-of-life intervals in patients with advanced cancer.

Design, Setting, And Participants: This nationwide, population-based, retrospective cohort study used data from the South Korean National Health Insurance Service database to examine broad-spectrum antibiotic use among patients with advanced cancer who died between July 1, 2002, and December 31, 2021.

View Article and Find Full Text PDF

Similar Publications

Left Atrial Appendage Occlusion vs Anticoagulants in Dialysis With Atrial Fibrillation.

JAMA Netw Open

September 2025

Department of Internal Medicine, University of Arkansas for Medical Sciences, Little Rock.

Gaurav Dhar , Milind A Phadnis , Suzanne L Hunt , Holly E Du , Vincz Ong

Importance: Patients with kidney failure (KF) receiving long-term dialysis have increased incidence of atrial fibrillation (AF). Patients with KF and AF have increased risk of stroke, death, and bleeding compared with age-matched cohorts. In KF, the use of oral anticoagulants (OACs) increases hemorrhage risk, offsetting potential benefits and making left atrial appendage occlusion (LAAO) a potentially promising solution for risk reduction in AF.

View Article and Find Full Text PDF

Similar Publications

The nature of fatigue in amyotrophic lateral sclerosis: a systematic review and meta-analysis.

Acta Neurol Belg

September 2025

Neuroscience Research Australia, University of New South Wales, Sydney, Australia.

Mansur A Kutlubaev , Ekaterina V Pervushina , Matthew C Kiernan

Objectives: Patients diagnosed with amyotrophic lateral sclerosis (ALS) typically describe symptoms of fatigue. Despite this frequency, the underlying mechanisms of fatigue are poorly understood, and are likely multifactorial. To help clarify mechanisms, the present systematic review was undertaken to determine the risk factors related to fatigue in ALS.

View Article and Find Full Text PDF

Similar Publications

Consensus on clinical outcome measures for lumbar spinal stenosis: recommendations from the ISSLS lumbar spinal stenosis taskforce.

Eur Spine J

September 2025

Hong Kong Polytechnic University, Hong Kong, China.

David B Anderson , Chelsia Cheung , Christy Lane , Markus Melloh , Jiri Dvorak

Purpose: The purpose of this study was to determine through a Delphi process a list of outcomes measures for clinicians to use when assessing individuals with Lumbar Spinal Stenosis (LSS).

Methods: A three-phase Delphi process was conducted by the International Society for the Study of the Lumbar Spine (ISSLS) Lumbar Spinal Stenosis Taskforce, including two online surveys, two virtual meetings, and three in-person consensus meetings at the ISSLS annual conferences (2023-2025). Participants evaluated and ranked outcome measures for LSS, with final endorsement requiring > 66% agreement.

View Article and Find Full Text PDF

Similar Publications

Pulmonary Emphysema at Low-Dose CT Lung Cancer Screening: Assessment Tools and AI Challenges.

Radiology

September 2025

Department of Electrical, Electronic, and Information Engineering "Guglielmo Marconi", University of Bologna, Cesena, Italy.

Mario Mascalchi , Stefano Diciotti

View Article and Find Full Text PDF

Similar Publications