Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

With the increasing application of large language models (LLMs) in the medical field, their potential in patient education and clinical decision support is becoming increasingly prominent. Given the complex pathogenesis, diverse treatment options, and lengthy rehabilitation periods of spinal cord injury (SCI), patients are increasingly turning to advanced online resources to obtain relevant medical information. This study analyzed responses from four LLMs-ChatGPT-4o, Claude-3.5 sonnet, Gemini-1.5 Pro, and Llama-3.1-to 37 SCI-related questions spanning pathogenesis, risk factors, clinical features, diagnostics, treatments, and prognosis. Quality and readability were assessed using the Ensuring Quality Information for Patients (EQIP) tool and Flesch-Kincaid metrics, respectively. Accuracy was independently scored by three senior spine surgeons using consensus scoring. Performance varied among the models. Gemini ranked highest in EQIP scores, suggesting superior information quality. Although the readability of all four LLMs was generally low, requiring a college-level reading comprehension ability, they were all able to effectively simplify complex content. Notably, ChatGPT led in accuracy, achieving significantly higher "Good" ratings (83.8%) compared to Claude (78.4%), Gemini (54.1%), and Llama (62.2%). Comprehensiveness scores were high across all models. Furthermore, the LLMs exhibited strong self-correction abilities. After being prompted for revision, the accuracy of ChatGPT and Claude's responses improved by 100% and 50%, respectively; both Gemini and Llama improved by 67%. This study represents the first systematic comparison of leading LLMs in the context of SCI. While Gemini excelled in response quality, ChatGPT provided the most accurate and comprehensive responses.

Download full-text PDF

Source
http://dx.doi.org/10.1007/s10916-025-02170-7DOI Listing

Publication Analysis

Top Keywords

large language
8
spinal cord
8
cord injury
8
models llms
8
quality readability
8
language models'
4
responses
4
models' responses
4
responses spinal
4
injury comparative
4

Similar Publications

Applications of Federated Large Language Model for Adverse Drug Reactions Prediction: Scoping Review.

J Med Internet Res

September 2025

Department of Information Systems and Cybersecurity, The University of Texas at San Antonio, 1 UTSA Circle, San Antonio, TX, 78249, United States, 1 (210) 458-6300.

Background: Adverse drug reactions (ADR) present significant challenges in health care, where early prevention is vital for effective treatment and patient safety. Traditional supervised learning methods struggle to address heterogeneous health care data due to their unstructured nature, regulatory constraints, and restricted access to sensitive personal identifiable information.

Objective: This review aims to explore the potential of federated learning (FL) combined with natural language processing and large language models (LLMs) to enhance ADR prediction.

View Article and Find Full Text PDF

Background: Primary liver cancer, particularly hepatocellular carcinoma (HCC), poses significant clinical challenges due to late-stage diagnosis, tumor heterogeneity, and rapidly evolving therapeutic strategies. While systematic reviews and meta-analyses are essential for updating clinical guidelines, their labor-intensive nature limits timely evidence synthesis.

Objective: This study proposes an automated literature screening workflow powered by large language models (LLMs) to accelerate evidence synthesis for HCC treatment guidelines.

View Article and Find Full Text PDF

Purpose: Speech disfluencies are common in individuals who do not stutter, with estimates suggesting a typical rate of six per 100 words. Factors such as language ability, processing load, planning difficulty, and communication strategy influence disfluency. Recent work has indicated that bilinguals may produce more disfluencies than monolinguals, but the factors underlying disfluency in bilingual children are poorly understood.

View Article and Find Full Text PDF

In this paper we analyse gender-based biases in the language within complex legal judgments. Our aims are: (i) to determine the extent to which purported biases discussed in the literature by feminist legal scholars are identifiable from the language of legal judgments themselves, and (ii) to uncover new forms of bias represented in the data that may promote further analysis and interpretation of the functioning of the legal system. We consider a large set of 2530 judgments in family law in Australia over a 20 year period, examining the way that male and female parties to a case are spoken to and about, by male and female judges, in relation to their capacity to provide care for children subject to the decision.

View Article and Find Full Text PDF

Evaluating anti-LGBTQIA+ medical bias in large language models.

PLOS Digit Health

September 2025

Department of Dermatology, Stanford University, Stanford, California, United States of America.

Large Language Models (LLMs) are increasingly deployed in clinical settings for tasks ranging from patient communication to decision support. While these models demonstrate race-based and binary gender biases, anti-LGBTQIA+ bias remains understudied despite documented healthcare disparities affecting these populations. In this work, we evaluated the potential of LLMs to propagate anti-LGBTQIA+ medical bias and misinformation.

View Article and Find Full Text PDF