Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Importance: Large language models (LLMs) are increasingly being explored in clinical decision-making, but few studies have evaluated their performance on complex ophthalmology cases from clinical practice settings. Understanding whether open-weight, reasoning-enhanced LLMs can outperform proprietary models has implications for clinical utility and accessibility.

Objective: To evaluate the diagnostic accuracy, management decision-making, and cost of DeepSeek-R1 vs OpenAI o1 across diverse ophthalmic subspecialties.

Design, Setting, And Participants: This was a cross-sectional evaluation conducted using standardized prompts and model configurations. Clinical cases were sourced from JAMA Ophthalmology's Clinical Challenge articles, containing complex cases from clinical practice settings. Each case included an open-ended diagnostic question and a multiple-choice next-step decision. All cases were included without exclusions, and no human participants were involved. Data were analyzed from March 13 to March 30, 2025.

Exposures: DeepSeek-R1and OpenAI o1 were evaluated using the Plan-and-Solve Plus (PS+) prompt engineering method.

Main Outcomes And Measures: Primary outcomes were diagnostic accuracy and next-step decision-making accuracy, defined as the proportion of correct responses. Token cost analyses were performed to estimate expenses. Intermodel agreement was evaluated using Cohen κ, and McNemar test was used to compare performance.

Results: A total of 422 clinical cases were included, spanning 10 subspecialties. DeepSeek-R1 achieved a higher diagnostic accuracy of 70.4% (297 of 422 cases) compared with 63.0% (266 of 422 cases) for OpenAI o1, a 7.3% difference (95% CI, 1.0%-13.7%; P = .02). For next-step decisions, DeepSeek-R1 was correct in 82.7% of cases (349 of 422 cases) vs OpenAI o1's accuracy of 75.8% (320 of 422 cases), a 6.9% difference (95% CI, 1.4%-12.3%; P = .01). Intermodel agreement was moderate (κ = 0.422; 95% CI, 0.375-0.469; P < .001). DeepSeek-R1 offered lower costs per query than OpenAI o1, with savings exceeding 66-fold (up to 98.5%) during off-peak pricing.

Conclusions And Relevance: DeepSeek-R1 outperformed OpenAI o1 in diagnosis and management across subspecialties while lowering operating costs, supporting the potential of open-weight, reinforcement learning-augmented LLMs as scalable and cost-saving tools for clinical decision support. Further investigations should evaluate safety guardrails and assess performance of self-hosted adaptations of DeepSeek-R1 with domain-specific ophthalmic expertise to optimize clinical utility.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12412037PMC
http://dx.doi.org/10.1001/jamaophthalmol.2025.2918DOI Listing

Publication Analysis

Top Keywords

422 cases
16
diagnostic accuracy
12
cases
10
deepseek-r1 openai
8
cases clinical
8
clinical practice
8
practice settings
8
clinical cases
8
cases included
8
intermodel agreement
8

Similar Publications

The Association Between Meteorological Factors and Rotavirus Gastroenteritis Incidence in Japan: A Time-Series Analysis.

Cureus

August 2025

Division of International Health, Graduate School of Medical and Dental Sciences, Niigata University, Niigata, JPN.

Introduction Rotavirus is the principal pathogen responsible for acute gastroenteritis and severe diarrhea in children worldwide and remains a significant public health threat. However, studies on the association between rotavirus gastroenteritis epidemics and meteorological factors in Japan are still scarce. In this study, we aimed to quantify the short-term effects of meteorological factors on the incidence of rotavirus gastroenteritis in Japan using advanced time-series modeling approaches.

View Article and Find Full Text PDF

Importance: Trisomy 13 (T13) and trisomy 18 (T18) are chromosomal abnormalities with high mortality rates in the first year of life. Understanding differences in long-term survival between children with full vs mosaic or partial trisomy is crucial for prognosis and health care planning.

Objective: To examine the differences in 10-year survival between children with full T13 and T18 vs those with mosaic or partial trisomy.

View Article and Find Full Text PDF

Importance: Large language models (LLMs) are increasingly being explored in clinical decision-making, but few studies have evaluated their performance on complex ophthalmology cases from clinical practice settings. Understanding whether open-weight, reasoning-enhanced LLMs can outperform proprietary models has implications for clinical utility and accessibility.

Objective: To evaluate the diagnostic accuracy, management decision-making, and cost of DeepSeek-R1 vs OpenAI o1 across diverse ophthalmic subspecialties.

View Article and Find Full Text PDF

Background: Young adults account for up to 15% of all ischemic strokes, yet data from Latin America remain scarce. Understanding their clinical profile and outcomes is essential to inform targeted interventions and public health strategies. We aimed to characterize demographics, vascular risk factors, stroke etiology, access to acute reperfusion therapies, and 90-day outcomes in Argentine patients aged 18-50 years with ischemic stroke.

View Article and Find Full Text PDF

A clinical case‒control study was conducted to identify risk factors for severe COVID-19 and to develop a predictive risk model to provide a reference for the dynamic assessment of the severity of disease in COVID-19 patients. A total of 410 patients with COVID-19 were included in the study, of whom 132 had severe or critical cases. The clinical data of the patients were collected, and the variables were subsequently screened via LASSO regression analysis and 10-fold cross-validation.

View Article and Find Full Text PDF