98%
921
2 minutes
20
Importance: Large language models (LLMs) are increasingly being explored in clinical decision-making, but few studies have evaluated their performance on complex ophthalmology cases from clinical practice settings. Understanding whether open-weight, reasoning-enhanced LLMs can outperform proprietary models has implications for clinical utility and accessibility.
Objective: To evaluate the diagnostic accuracy, management decision-making, and cost of DeepSeek-R1 vs OpenAI o1 across diverse ophthalmic subspecialties.
Design, Setting, And Participants: This was a cross-sectional evaluation conducted using standardized prompts and model configurations. Clinical cases were sourced from JAMA Ophthalmology's Clinical Challenge articles, containing complex cases from clinical practice settings. Each case included an open-ended diagnostic question and a multiple-choice next-step decision. All cases were included without exclusions, and no human participants were involved. Data were analyzed from March 13 to March 30, 2025.
Exposures: DeepSeek-R1and OpenAI o1 were evaluated using the Plan-and-Solve Plus (PS+) prompt engineering method.
Main Outcomes And Measures: Primary outcomes were diagnostic accuracy and next-step decision-making accuracy, defined as the proportion of correct responses. Token cost analyses were performed to estimate expenses. Intermodel agreement was evaluated using Cohen κ, and McNemar test was used to compare performance.
Results: A total of 422 clinical cases were included, spanning 10 subspecialties. DeepSeek-R1 achieved a higher diagnostic accuracy of 70.4% (297 of 422 cases) compared with 63.0% (266 of 422 cases) for OpenAI o1, a 7.3% difference (95% CI, 1.0%-13.7%; P = .02). For next-step decisions, DeepSeek-R1 was correct in 82.7% of cases (349 of 422 cases) vs OpenAI o1's accuracy of 75.8% (320 of 422 cases), a 6.9% difference (95% CI, 1.4%-12.3%; P = .01). Intermodel agreement was moderate (κ = 0.422; 95% CI, 0.375-0.469; P < .001). DeepSeek-R1 offered lower costs per query than OpenAI o1, with savings exceeding 66-fold (up to 98.5%) during off-peak pricing.
Conclusions And Relevance: DeepSeek-R1 outperformed OpenAI o1 in diagnosis and management across subspecialties while lowering operating costs, supporting the potential of open-weight, reinforcement learning-augmented LLMs as scalable and cost-saving tools for clinical decision support. Further investigations should evaluate safety guardrails and assess performance of self-hosted adaptations of DeepSeek-R1 with domain-specific ophthalmic expertise to optimize clinical utility.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12412037 | PMC |
http://dx.doi.org/10.1001/jamaophthalmol.2025.2918 | DOI Listing |
Cureus
August 2025
Division of International Health, Graduate School of Medical and Dental Sciences, Niigata University, Niigata, JPN.
Introduction Rotavirus is the principal pathogen responsible for acute gastroenteritis and severe diarrhea in children worldwide and remains a significant public health threat. However, studies on the association between rotavirus gastroenteritis epidemics and meteorological factors in Japan are still scarce. In this study, we aimed to quantify the short-term effects of meteorological factors on the incidence of rotavirus gastroenteritis in Japan using advanced time-series modeling approaches.
View Article and Find Full Text PDFJAMA Netw Open
September 2025
Department of Epidemiology, University of Texas Health Science Center at Houston School of Public Health, Houston.
Importance: Trisomy 13 (T13) and trisomy 18 (T18) are chromosomal abnormalities with high mortality rates in the first year of life. Understanding differences in long-term survival between children with full vs mosaic or partial trisomy is crucial for prognosis and health care planning.
Objective: To examine the differences in 10-year survival between children with full T13 and T18 vs those with mosaic or partial trisomy.
JAMA Ophthalmol
September 2025
Department of Ophthalmology, University of Montreal, Montreal, Quebec, Canada.
Importance: Large language models (LLMs) are increasingly being explored in clinical decision-making, but few studies have evaluated their performance on complex ophthalmology cases from clinical practice settings. Understanding whether open-weight, reasoning-enhanced LLMs can outperform proprietary models has implications for clinical utility and accessibility.
Objective: To evaluate the diagnostic accuracy, management decision-making, and cost of DeepSeek-R1 vs OpenAI o1 across diverse ophthalmic subspecialties.
Int J Stroke
September 2025
Centro integral de Neurología Vascular, FLENI.
Background: Young adults account for up to 15% of all ischemic strokes, yet data from Latin America remain scarce. Understanding their clinical profile and outcomes is essential to inform targeted interventions and public health strategies. We aimed to characterize demographics, vascular risk factors, stroke etiology, access to acute reperfusion therapies, and 90-day outcomes in Argentine patients aged 18-50 years with ischemic stroke.
View Article and Find Full Text PDFBMC Pulm Med
September 2025
North China University of Science and Technology Affiliated Hospital, Tangshan, Hebei, China.
A clinical case‒control study was conducted to identify risk factors for severe COVID-19 and to develop a predictive risk model to provide a reference for the dynamic assessment of the severity of disease in COVID-19 patients. A total of 410 patients with COVID-19 were included in the study, of whom 132 had severe or critical cases. The clinical data of the patients were collected, and the variables were subsequently screened via LASSO regression analysis and 10-fold cross-validation.
View Article and Find Full Text PDF