Performance of ChatGPT, Gemini and DeepSeek for non-critical triage support using real-world conversations in emergency department.

Sukyo Lee , Sumin Jung , Jong-Hak Park , Hanjin Cho , Sungwoo Moon , Sejoong Ahn

BMC Emerg Med

Department of Emergency Medicine, Korea University Ansan Hospital, Ansan-si, 15355, Republic of Korea.

Published: September 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Background: Timely and accurate triage is crucial for the emergency department (ED) care. Recently, there has been growing interest in applying large language models (LLMs) to support triage decision-making. However, most existing studies have evaluated these models using simulated scenarios rather than real-world clinical cases. Therefore, we evaluated the performance of multiple commercial LLMs for non-critical triage support in ED using real-world clinical conversations.

Methods: We retrospectively analyzed real-world triage conversations prospectively collected from three tertiary hospitals in South Korea. Multiple commercial LLMs-including OpenAI GPT-4o, GPT-4.1, O3, Google Gemini 2.0 flash, Gemini 2.5 flash, Gemini 2.5 pro, DeepSeek V3, and DeepSeek R1-were evaluated for the accuracy in triaging patient urgency based solely on unsummarized dialogue. The Korean Triage and Acuity Scale (KTAS) assigned by triage nurses was used as the gold standard for evaluating the LLM classifications. Model performance was assessed under both a zero-shot prompting condition and a few-shot prompting condition that included representative examples.

Results: A total of 1,057 triage cases were included in the analysis. Among the models, Gemini 2.5 flash achieved the highest accuracy (73.8%), specificity (88.9%), and PPV (94.0%). Gemini 2.5 pro demonstrated the highest sensitivity (90.9%) and F1-score (82.4%), though with lower specificity (23.3%). GPT-4.1 also showed balanced high accuracy (70.6%) and sensitivity (81.3%) with practical response times (1.79s). Performance varied widely between models and even between different versions from the same vendor. With few-shot prompting, most models showed further improvements in accuracy and F1-score.

Conclusions: LLMs can accurately triage ED patient urgency using real-world clinical conversations. Several models demonstrated both high sensitivity and acceptable response times, supporting the feasibility of LLM in non-critical triage support tools in diverse clinical environments. These findings apply to non-critical patients (KTAS 3-5), and further research should address integration with objective clinical data and real-time workflow.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12403343	PMC
http://dx.doi.org/10.1186/s12873-025-01337-2	DOI Listing

Publication Analysis

Top Keywords

non-critical triage

triage support

real-world clinical

gemini flash

triage

support real-world

emergency department

multiple commercial

flash gemini

gemini pro

Similar Publications

Performance of ChatGPT, Gemini and DeepSeek for non-critical triage support using real-world conversations in emergency department.

BMC Emerg Med

September 2025

Department of Emergency Medicine, Korea University Ansan Hospital, Ansan-si, 15355, Republic of Korea.

Sukyo Lee , Sumin Jung , Jong-Hak Park , Hanjin Cho , Sungwoo Moon

View Article and Find Full Text PDF

Similar Publications

The Role of Telemedicine in Emergency Department Triage and Patient Care: A Systematic Review.

Cureus

December 2024

College of Medicine, Jazan University, Jazan, SAU.

Anas A Ahmed , Mohammed E Mojiri , Ali A Daghriri , Ohoud A Hakami , Reem F Alruwaili

Overcrowding in emergency departments (EDs) is a global challenge, leading to prolonged waiting times and adverse patient outcomes. Telemedicine has emerged as a promising solution, enabling remote consultation, triage, and real-time specialist input. Despite its growing application, limited systematic research exists on its specific role in ED triage and care.

View Article and Find Full Text PDF

Similar Publications

Improving Quality of Care for Vacation-Related Emergency Department Visits: A Narrative Review of Patient Satisfaction and Contributing Factors.

Cureus

November 2024

Department of Emergency Medicine, Ibn Sina Hospital, Makkah, SAU.

Mahmoud S Alsomali , Mohammed A Altawili , Modaf Mohammed Albishi , Alharbi Naif Fahad D , Kalied Faihan M Al Otaibi

Emergency departments (EDs) encounter substantial challenges during peak vacation periods, including increased patient volumes, limited access to medical histories, language and cultural barriers, insurance complexities, and disruptions in continuity of care. These factors strain emergency department operations, resulting in prolonged wait times, diagnostic errors, and compromised care quality. This study reviews the literature to identify patient satisfaction indicators and common challenges and evaluate strategies to improve patient outcomes during vacation-related emergency department visits.

View Article and Find Full Text PDF

Similar Publications

Reduced Pharmacological Intervention of Prehospital Services for Acute Alcohol Intoxication during the COVID-19 Pandemic in A Large District of Southern Italy.

J Clin Med

May 2024

Department of Internal Medicine, Hospital "A. Maresca"-ASL Naples 3 Sud, Via Montedoro, 1, 80059 Torre del Greco (Naples), Italy.

Arcangela Giustino , Annamaria Natola , Giovanni Savoia , Maria Antonietta De Salvia , Carmine Finelli

Stress during a pandemic increases the risk of alcohol consumption, which may require pharmacological management. An observational single-center retrospective study was conducted from 1 January 2018 to 31 December 2021, and divided into 2-year periods (2018-2019 and 2020-2021). This study focused on calls to one of the emergency departments (EDs) of seven hospitals in the Bari (Italy) metropolitan area for patients requiring emergency services (ESs) who were either admitted or not admitted, due to their refusal.

View Article and Find Full Text PDF

Similar Publications

Frequency, Prognosis, and Clinical Features of Unexpected versus Expected Cardiac Arrest in the Emergency Department: A Retrospective Analysis.

J Clin Med

April 2024

Department of Emergency Medicine, Medical University of Gdańsk, M. Skłodowskiej-Curie 3a Street, 80-210 Gdańsk, Poland.

Karolina Szaruta-Raflesz , Tomasz Łopaciński , Mariusz Siemiński

: Though out-of-hospital CA (OHCA) is widely reported, data on in-hospital CA (IHCA) and especially cardiac arrest (CA) in the emergency department (CAED) are scarce. This study aimed to determine the frequency, prevalence, and clinical features of unexpected CAED and compare the data with those of expected CAED. : We defined unexpected CAED as CA occurring in patients in non-critical ED-care areas; classified as not requiring strict monitoring.

View Article and Find Full Text PDF

Similar Publications