Reliability of ChatGPT for performing triage task in the emergency department using the Korean Triage and Acuity Scale.

Jae Hyuk Kim , Sun Kyung Kim , Jongmyung Choi , Youngho Lee

Digit Health

Department of Computer Engineering, Mokpo National University, Jeonnam, South Korea.

Published: January 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Background: Artificial intelligence (AI) technology can enable more efficient decision-making in healthcare settings. There is a growing interest in improving the speed and accuracy of AI systems in providing responses for given tasks in healthcare settings.

Objective: This study aimed to assess the reliability of ChatGPT in determining emergency department (ED) triage accuracy using the Korean Triage and Acuity Scale (KTAS).

Methods: Two hundred and two virtual patient cases were built. The gold standard triage classification for each case was established by an experienced ED physician. Three other human raters (ED paramedics) were involved and rated the virtual cases individually. The virtual cases were also rated by two different versions of the chat generative pre-trained transformer (ChatGPT, 3.5 and 4.0). Inter-rater reliability was examined using Fleiss' kappa and intra-class correlation coefficient (ICC).

Results: The kappa values for the agreement between the four human raters and ChatGPTs were .523 (version 4.0) and .320 (version 3.5). Of the five levels, the performance was poor when rating patients at levels 1 and 5, as well as case scenarios with additional text descriptions. There were differences in the accuracy of the different versions of GPTs. The ICC between version 3.5 and the gold standard was .520, and that between version 4.0 and the gold standard was .802.

Conclusions: A substantial level of inter-rater reliability was revealed when GPTs were used as KTAS raters. The current study showed the potential of using GPT in emergency healthcare settings. Considering the shortage of experienced manpower, this AI method may help improve triaging accuracy.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10798071	PMC
http://dx.doi.org/10.1177/20552076241227132	DOI Listing

Publication Analysis

Top Keywords

gold standard

reliability chatgpt

emergency department

korean triage

triage acuity

acuity scale

healthcare settings

human raters

virtual cases

inter-rater reliability

Similar Publications

Clinical Utility of Transcatheter PFO Exploration in Cryptogenic Stroke Patients With Negative TEE but High Suspicion of PFO-Related Etiology.

Catheter Cardiovasc Interv

September 2025

Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.

Luxi Guan , Dong Luo , Meijun Liu , Zhengwei Li , Xiangbin Pan

Background: Patent foramen ovale (PFO) has been identified as a potential risk factor for cryptogenic stroke (CS). Although transesophageal echocardiography (TEE) is considered the gold standard for PFO detection, false-negative results remain a clinical concern, particularly in CS patients with high suspicion of PFO-related etiology.

Aims: To evaluate the clinical utility of transcatheter PFO exploration (TPFOE) in CS patients with negative TEE findings but high suspicion of PFO-related etiology.

View Article and Find Full Text PDF

Similar Publications

Validity and Reliability of Resting Energy Expenditure Measured by Indirect Calorimetry in Adults with Overweight and Obesity: a Rapid Systematic Review.

Obes Surg

September 2025

Faculty of Health Sciences & Medicine, Bond University, Gold Coast, Australia.

William Bruce , Lynette Law , Elizabeth Chen , Xueying Tang , Skye Marshall

This rapid systematic review aimed to evaluate the diagnostic accuracy (concurrent validity, predictive ability, reliability) of indirect calorimetry (IC) for measuring resting energy expenditure (REE) in adults with overweight or obesity. PubMed and Web of Science searched for studies measuring REE by IC in adults with overweight or obesity and reported primary outcomes: concurrent validity, predictive ability, or reliability. N = 22 studies were included that evaluated n = 10 IC devices.

View Article and Find Full Text PDF

Similar Publications

Biosynthetic ε-poly-L-lysine for the treatment of extensively- and pan-drug-resistant Pseudomonas aeruginosa.

NPJ Antimicrob Resist

September 2025

Ophthalmology and Visual Sciences Academic Clinical Program, Duke-NUS Graduate Medical School, Singapore, Singapore.

Darren Shu Jeng Ting , Thet Tun Aung , Venkatesh Mayandi , Mercy Halleluyah Periayah , Eunice Tze Leng Goh

Pseudomonas aeruginosa (PA) represents a major cause of antimicrobial resistance-related morbidity and mortality. The recent emergence of highly fatal infections, caused by carbapenem-resistant PA, has called for novel antimicrobial therapies and strategies. In this study, we highlight the therapeutic potential of ε-poly-L-lysine (εPL), an antimicrobial polymer for treating extensively-and pan-drug-resistant-PA.

View Article and Find Full Text PDF

Similar Publications

[Cough frequency monitoring: current technologies and clinical research applications].

Zhonghua Jie He He Hu Xi Za Zhi

September 2025

Department of Respiratory and Critical Care Medicine, the First Affiliated Hospital of Guangzhou Medical University, National Center for Respiratory Medicine, National Clinical Research Center for Respiratory Disease, State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory He

J X Xie , K F Lai

Cough is a common symptom of many respiratory diseases, and parameters such as frequency, intensity, type and duration play important roles in disease screening, diagnosis and prognosis. Among these, cough frequency is the most widely applied metric. In current clinical practice, cough severity is primarily assessed based on patients' subjective symptom descriptions in combination with semi-structured questionnaires.

View Article and Find Full Text PDF

Similar Publications

Isolated Congenital Middle Ear Malformations: Comparison of preoperative 0.1 mm Ultra-High-Resolution CT and Conventional High-Resolution CT.

AJNR Am J Neuroradiol

September 2025

From the Department of Otorhinolaryngology Head and Neck Surgery (J.G., Y.L., S.G.) and Department of Radiology (N.X., R.T., H.D.,Z.Y., Z.W., P.Z.), Beijing Friendship Hospital, Capital Medical University, Beijing, China.

Jingying Guo , Ning Xu , Ruowei Tang , Heyu Ding , Yuhe Liu

Background And Purpose: Isolated congenital middle ear malformation contributes significantly to congenital hearing loss and growth problems. This study aims to compare 0.1 mm isotropic ultra-high-resolution computed tomography and conventional high-resolution computed tomography for assessing isolated congenital middle ear malformation, using surgical exploration as the gold standard.

View Article and Find Full Text PDF

Similar Publications