98%
921
2 minutes
20
Acute ischemic stroke (AIS) is a leading cause of mortality and disability worldwide, with early and accurate diagnosis being critical for timely intervention and improved patient outcomes. This retrospective study aimed to assess the diagnostic performance of two advanced artificial intelligence (AI) models, Chat Generative Pre-trained Transformer (ChatGPT-4o) and Claude 3.5 Sonnet, in identifying AIS from diffusion-weighted imaging (DWI). The DWI images of a total of 110 cases (AIS group: = 55, healthy controls: = 55) were provided to the AI models via standardized prompts. The models' responses were compared to radiologists' gold-standard evaluations, and performance metrics such as sensitivity, specificity, and diagnostic accuracy were calculated. Both models exhibited a high sensitivity for AIS detection (ChatGPT-4o: 100%, Claude 3.5 Sonnet: 94.5%). However, ChatGPT-4o demonstrated a significantly lower specificity (3.6%) compared to Claude 3.5 Sonnet (74.5%). The agreement with radiologists was poor for ChatGPT-4o (κ = 0.036; %95 CI: -0.013, 0.085) but good for Claude 3.5 Sonnet (κ = 0.691; %95 CI: 0.558, 0.824). In terms of the AIS hemispheric localization accuracy, Claude 3.5 Sonnet (67.2%) outperformed ChatGPT-4o (32.7%). Similarly, for specific AIS localization, Claude 3.5 Sonnet (30.9%) showed greater accuracy than ChatGPT-4o (7.3%), with these differences being statistically significant ( < 0.05). This study highlights the superior diagnostic performance of Claude 3.5 Sonnet compared to ChatGPT-4o in identifying AIS from DWI. Despite its advantages, both models demonstrated notable limitations in accuracy, emphasizing the need for further development before achieving full clinical applicability. These findings underline the potential of AI tools in radiological diagnostics while acknowledging their current limitations.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11765597 | PMC |
http://dx.doi.org/10.3390/jcm14020571 | DOI Listing |
J Cancer Res Clin Oncol
September 2025
Department of Surgery, Mannheim School of Medicine, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany.
Purpose: The study aims to compare the treatment recommendations generated by four leading large language models (LLMs) with those from 21 sarcoma centers' multidisciplinary tumor boards (MTBs) of the sarcoma ring trial in managing complex soft tissue sarcoma (STS) cases.
Methods: We simulated STS-MTBs using four LLMs-Llama 3.2-vison: 90b, Claude 3.
J Robot Surg
September 2025
Ayub Medical College, Abbottabad, Pakistan.
J Med Syst
September 2025
The Research Unit of Evidence Synthesis (TRUES), Faculty of Pharmaceutical Sciences, Naresuan University, Phitsanulok, Thailand.
The use of generative AI in systematic review workflows has gained attention for enhancing study selection efficiency. However, evidence on its screening performance remains inconclusive, and direct comparisons between different generative AI models are still limited. The objective of this study is to evaluate the performance of ChatGPT-4o and Claude 3.
View Article and Find Full Text PDFInt J STD AIDS
September 2025
Infectious Diseases and Clinical Microbiology, Izmir Democracy University, Izmir, Turkey.
BackgroundThis study compares three large language models (LLMs) in answering common HIV questions, given ongoing concerns about their accuracy and reliability in patient education.MethodsModels answered 63 HIV questions. Accuracy (5-point Likert), readability (Flesch-Kincaid, Gunning Fog, Coleman-Liau), and reliability (DISCERN, EQIP) were assessed.
View Article and Find Full Text PDFJ Yeungnam Med Sci
September 2025
Department of Dentistry, Malda Medical College and Hospital, Malda, India.
Background: Large language models (LLMs) have rapidly emerged as valuable tools in medical and dental education that support clinical reasoning, patient communication, and academic instruction. However, their effectiveness in conveying specialized content, such as fluoride-related dental knowledge, requires a thorough evaluation. This study assesses the performance of four advanced LLMs-ChatGPT-4 (OpenAI), Claude 3.
View Article and Find Full Text PDF