98%
921
2 minutes
20
Background: Large language models (LLMs) are increasingly used in clinical decision support, and newly developed models have demonstrated promising potential, yet their diagnostic performance for critically ill patients in intensive care unit (ICU) settings remains underexplored. This study evaluated the diagnostic accuracy, differential diagnosis quality, and response quality in critical illness cases of four newly developed LLMs.
Methods: In this cross-sectional comparative study, four newly developed LLMs-ChatGPT-4o, ChatGPT-o3, DeepSeek-V3, and DeepSeek-R1-were evaluated using 50 critical illness cases in ICU settings from published literature. Diagnostic accuracy and response quality were compared across models.
Results: A total of 50 critical illness cases were included. ChatGPT-o3 achieved the top diagnosis accuracy at 72 % (36/50; 95 % CI 0.600-0.840), followed by DeepSeek-R1 at 68 % (34/50; 95 % CI 0.540-0.800), ChatGPT-4o at 64 % (32/50; 95 % CI 0.500-0.760), and DeepSeek-V3 at 32 % (16/50; 95 % CI 0.200-0.460). ChatGPT-o3, DeepSeek-R1, and ChatGPT-4o all significantly outperformed DeepSeek-V3, with no significant differences among the three. The median differential quality score was 5.0 for ChatGPT-o3 (IQR 5.0-5.0; 95 % CI 5.0-5.0), DeepSeek-R1 (IQR 5.0-5.0; 95 % CI 5.0-5.0), and ChatGPT-4o (IQR 4.0-5.0; 95 % CI 4.5-5.0), and 4.0 for DeepSeek-V3 (IQR 3.0-5.0; 95 % CI 4.0-5.0). ChatGPT-o3 and DeepSeek-R1 scored significantly higher than DeepSeek-V3; ChatGPT-4o showed a non-significant trend toward better performance.All models received high Likert ratings for response completeness, clarity, and usefulness. ChatGPT-o3, DeepSeek-R1, and ChatGPT-4o each showed a trend toward better response quality compared to DeepSeek-V3, although no significant differences were observed among the models.
Conclusions: The newly developed models, especially the reasoning models, demonstrated strong potential in supporting diagnosis in critical illness cases in ICU settings. With further domain-specific fine-tuning, their diagnostic accuracy could be further enhanced. Notably, the open-source reasoning model DeepSeek-R1 performed competitively, suggesting strong potential for scalable deployment in resource-limited settings.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.ijmedinf.2025.106088 | DOI Listing |
Scand J Rheumatol
September 2025
The Parker Institute, Copenhagen University Hospital, Bispebjerg and Frederiksberg, Frederiksberg, Denmark.
Objective: Pain hypersensitivity and hypersensitivity to other sensory modalities (visual, auditory, olfactory, and tactile) are considered defining features in nociplastic pain states. A self-report measure of sensory sensitivity may help to characterize sensory profiles across pain populations. This study aimed to evaluate the psychometric properties of a newly developed Danish nine-item Sensory Sensitivity Profile (SSP) questionnaire in patients with fibromyalgia.
View Article and Find Full Text PDFEur J Case Rep Intern Med
August 2025
Department of Internal Medicine, Wayne State University School of Medicine, Trinity Health Oakland Hospital, Pontiac, USA.
Background: Invasive central nervous system (CNS) aspergillosis is rare among human immunodeficiency virus (HIV)-positive patients due to preserved neutrophil function, despite significant CD4+ T-cell depletion. Diagnosis typically requires histopathologic confirmation, but polymerase chain reaction (PCR) testing has introduced new challenges due to its high sensitivity but limited specificity.
Case Presentation: We describe a newly diagnosed 43-year-old HIV-positive male with concurrent Hodgkin lymphoma who presented with progressive neurological decline and a ring-enhancing brain lesion.
J Hepatocell Carcinoma
September 2025
Department of Liver Disease, Shuguang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, People's Republic of China.
Objective: Anoikis is an anchorage-dependent programmed cell death implicated in multiple pathological processes of cancers; however, the prognostic value of anoikis-related genes (ANRGs) in hepatocellular carcinoma (HCC) remains unclear. Our study aims to develop an ANRGs-based prediction model to improve prognostic assessment in HCC patients.
Methods: The RNA-seq profile was performed to estimate the expression of ANRGs in HCC patients.
J Appl Stat
February 2025
Department of Mathematics and State Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, People's Republic of China.
We conduct gene mutation rate estimations via developing mutual information and Ewens sampling based convolutional neural network (CNN) and machine learning algorithms. More precisely, we develop a systematic methodology through constructing a CNN. Meanwhile, we develop two machine learning algorithms to study protein production with target gene sequences and protein structures.
View Article and Find Full Text PDFFront Med (Lausanne)
August 2025
Department of Nursing, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.
Background: Inflammatory bowel disease (IBD) is a chronic condition characterized by the need for highly individualized treatment plans, requiring patients to make numerous complex medical decisions. Shared decision-making (SDM) has proven effective in improving treatment outcomes, patient satisfaction, and adherence in IBD management; however, its clinical implementation remains challenging. In China, formal SDM nurse roles have not yet been established.
View Article and Find Full Text PDF