ChatGPT-4 Knows Its A B C D E but Cannot Cite Its Source.

Diane Ghanem , Alexander R Zhu , Whitney Kagabo , Greg Osgood , Babar Shafiq

JB JS Open Access

Department of Orthopaedic Surgery, The Johns Hopkins Hospital, Baltimore, Maryland.

Published: September 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Introduction: The artificial intelligence language model Chat Generative Pretrained Transformer (ChatGPT) has shown potential as a reliable and accessible educational resource in orthopaedic surgery. Yet, the accuracy of the references behind the provided information remains elusive, which poses a concern for maintaining the integrity of medical content. This study aims to examine the accuracy of the references provided by ChatGPT-4 concerning the Airway, Breathing, Circulation, Disability, Exposure (ABCDE) approach in trauma surgery.

Methods: Two independent reviewers critically assessed 30 ChatGPT-4-generated references supporting the well-established ABCDE approach to trauma protocol, grading them as 0 (nonexistent), 1 (inaccurate), or 2 (accurate). All discrepancies between the ChatGPT-4 and PubMed references were carefully reviewed and bolded. Cohen's Kappa coefficient was used to examine the agreement of the accuracy scores of the ChatGPT-4-generated references between reviewers. Descriptive statistics were used to summarize the mean reference accuracy scores. To compare the variance of the means across the 5 categories, one-way analysis of variance was used.

Results: ChatGPT-4 had an average reference accuracy score of 66.7%. Of the 30 references, only 43.3% were accurate and deemed "true" while 56.7% were categorized as "false" (43.3% inaccurate and 13.3% nonexistent). The accuracy was consistent across the 5 trauma protocol categories, with no significant statistical difference (p = 0.437).

Discussion: With 57% of references being inaccurate or nonexistent, ChatGPT-4 has fallen short in providing reliable and reproducible references-a concerning finding for the safety of using ChatGPT-4 for professional medical decision making without thorough verification. Only if used cautiously, with cross-referencing, can this language model act as an adjunct learning tool that can enhance comprehensiveness as well as knowledge rehearsal and manipulation.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11368215	PMC
http://dx.doi.org/10.2106/JBJS.OA.24.00099	DOI Listing

Publication Analysis

Top Keywords

language model

accuracy references

references provided

abcde approach

approach trauma

chatgpt-4-generated references

trauma protocol

accuracy scores

reference accuracy

references

Similar Publications

A real-world study on the safety and efficacy of therapeutic plasma exchange in patients with Alzheimer's disease.

J Alzheimers Dis

September 2025

Paula Costa-Urrutia Medical Affairs, Terumo BCT, Edificio Think MVD, Montevideo, Uruguay.

Fernando Taragano , Daniel Seinhart , Patricia Epstein , Vanina Sylvestre , Cecilia Barañano

BackgroundTherapeutic plasma exchange (TPE) with albumin replacement has emerged as a potential treatment for Alzheimer's disease (AD). The AMBAR trial showed that TPE could slow cognitive and functional decline, along with changes in core and inflammatory biomarkers in cerebrospinal fluid.ObjectiveTo evaluate the safety and effectiveness of TPE in a real-world setting in Argentina.

View Article and Find Full Text PDF

Similar Publications

Compositionality in the semantic network: a model-driven representational similarity analysis.

Cereb Cortex

August 2025

Department of Psychology, University of Milano-Bicocca, Milan, Italy.

Marco Ciapparelli , Marco Marelli , William Graves , Carlo Reverberi

Semantic composition allows us to construct complex meanings (e.g., "dog house", "house dog") from simpler constituents ("dog", "house").

View Article and Find Full Text PDF

Similar Publications

Commentary on "DeepSeek-R1 and GPT-4 are comparable in a complex diagnostic challenge: a historical control study".

Int J Surg

September 2025

The Third Affiliated Hospital of Zhejiang Chinese Medical University, Hangzhou, Zhejiang, China.

Hanzhe Lv , Longhao Chen , Zhizhen Lv , Lijiang Lv

View Article and Find Full Text PDF

Similar Publications

Guideline adherence in surgical decisions for T1 colorectal cancer after endoscopic resection: large language models vs clinicians.

Int J Surg

September 2025

Digestive Endoscopy Center, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Shanghai, China.

Liangtang Zeng , Cao Qinxing , Junyuan Deng , Junnan Hu , Minghui Pang

Background: Patients with T1 colorectal cancer (CRC) often show poor adherence to guideline-recommended treatment strategies after endoscopic resection. To address this challenge and improve clinical decision-making, this study aims to compare the accuracy of surgical management recommendations between large language models (LLMs) and clinicians.

Methods: This retrospective study enrolled 202 patients with T1 CRC who underwent endoscopic resection at three hospitals.

View Article and Find Full Text PDF

Similar Publications

Leveraging Language Model, Crystal Structure Prediction and First-Principles Calculation for Material Design.

J Chem Inf Model

September 2025

Songshan Lake Materials Laboratory, Dongguan 523808, PR China.

Lei Zhang , Ben Ni , Kaiyang Xu , Yiru Huang , Qingfang Li

Large language models (LLMs) have demonstrated transformative potential for materials discovery in condensed matter systems, but their full utility requires both broader application scenarios and integration with ab initio crystal structure prediction (CSP), density functional theory (DFT) methods and domain knowledge to benefit future inverse material design. Here, we develop an integrated computational framework combining language model-guided materials screening with genetic algorithm (GA) and graph neural network (GNN)-based CSP methods to predict new photovoltaic material. This LLM + CSP + DFT approach successfully identifies a previously overlooked oxide material with unexpected photovoltaic potential.

View Article and Find Full Text PDF

Similar Publications