DeepSeek-R1 vs OpenAI o1 for Ophthalmic Diagnoses and Management Plans.

David Mikhail , Andrew Farah , Jason Milad , Andrew Mihalache , Daniel Milad , Fares Antaki , Michael Balas , Marko M Popovic , Rajeev H Muni , Pearse A Keane , Renaud Duval

JAMA Ophthalmol

Department of Ophthalmology, University of Montreal, Montreal, Quebec, Canada.

Published: September 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Importance: Large language models (LLMs) are increasingly being explored in clinical decision-making, but few studies have evaluated their performance on complex ophthalmology cases from clinical practice settings. Understanding whether open-weight, reasoning-enhanced LLMs can outperform proprietary models has implications for clinical utility and accessibility.

Objective: To evaluate the diagnostic accuracy, management decision-making, and cost of DeepSeek-R1 vs OpenAI o1 across diverse ophthalmic subspecialties.

Design, Setting, And Participants: This was a cross-sectional evaluation conducted using standardized prompts and model configurations. Clinical cases were sourced from JAMA Ophthalmology's Clinical Challenge articles, containing complex cases from clinical practice settings. Each case included an open-ended diagnostic question and a multiple-choice next-step decision. All cases were included without exclusions, and no human participants were involved. Data were analyzed from March 13 to March 30, 2025.

Exposures: DeepSeek-R1and OpenAI o1 were evaluated using the Plan-and-Solve Plus (PS+) prompt engineering method.

Main Outcomes And Measures: Primary outcomes were diagnostic accuracy and next-step decision-making accuracy, defined as the proportion of correct responses. Token cost analyses were performed to estimate expenses. Intermodel agreement was evaluated using Cohen κ, and McNemar test was used to compare performance.

Results: A total of 422 clinical cases were included, spanning 10 subspecialties. DeepSeek-R1 achieved a higher diagnostic accuracy of 70.4% (297 of 422 cases) compared with 63.0% (266 of 422 cases) for OpenAI o1, a 7.3% difference (95% CI, 1.0%-13.7%; P = .02). For next-step decisions, DeepSeek-R1 was correct in 82.7% of cases (349 of 422 cases) vs OpenAI o1's accuracy of 75.8% (320 of 422 cases), a 6.9% difference (95% CI, 1.4%-12.3%; P = .01). Intermodel agreement was moderate (κ = 0.422; 95% CI, 0.375-0.469; P < .001). DeepSeek-R1 offered lower costs per query than OpenAI o1, with savings exceeding 66-fold (up to 98.5%) during off-peak pricing.

Conclusions And Relevance: DeepSeek-R1 outperformed OpenAI o1 in diagnosis and management across subspecialties while lowering operating costs, supporting the potential of open-weight, reinforcement learning-augmented LLMs as scalable and cost-saving tools for clinical decision support. Further investigations should evaluate safety guardrails and assess performance of self-hosted adaptations of DeepSeek-R1 with domain-specific ophthalmic expertise to optimize clinical utility.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12412037	PMC
http://dx.doi.org/10.1001/jamaophthalmol.2025.2918	DOI Listing

Publication Analysis

Top Keywords

422 cases

diagnostic accuracy

cases

deepseek-r1 openai

cases clinical

clinical practice

practice settings

clinical cases

cases included

intermodel agreement

Similar Publications

The Association Between Meteorological Factors and Rotavirus Gastroenteritis Incidence in Japan: A Time-Series Analysis.

Cureus

August 2025

Division of International Health, Graduate School of Medical and Dental Sciences, Niigata University, Niigata, JPN.

Keita Wagatsuma

Introduction Rotavirus is the principal pathogen responsible for acute gastroenteritis and severe diarrhea in children worldwide and remains a significant public health threat. However, studies on the association between rotavirus gastroenteritis epidemics and meteorological factors in Japan are still scarce. In this study, we aimed to quantify the short-term effects of meteorological factors on the incidence of rotavirus gastroenteritis in Japan using advanced time-series modeling approaches.

View Article and Find Full Text PDF

Similar Publications

Long-Term Survival Among Children With Trisomy 13 and Trisomy 18 by Cytogenetic Status.

JAMA Netw Open

September 2025

Department of Epidemiology, University of Texas Health Science Center at Houston School of Public Health, Houston.

Katherine L Ludorf , Renata H Benjamin , Charles J Shumate , Mark A Canfield , Joanne Nguyen

Importance: Trisomy 13 (T13) and trisomy 18 (T18) are chromosomal abnormalities with high mortality rates in the first year of life. Understanding differences in long-term survival between children with full vs mosaic or partial trisomy is crucial for prognosis and health care planning.

Objective: To examine the differences in 10-year survival between children with full T13 and T18 vs those with mosaic or partial trisomy.

View Article and Find Full Text PDF

Similar Publications

DeepSeek-R1 vs OpenAI o1 for Ophthalmic Diagnoses and Management Plans.

JAMA Ophthalmol

September 2025

Department of Ophthalmology, University of Montreal, Montreal, Quebec, Canada.

David Mikhail , Andrew Farah , Jason Milad , Andrew Mihalache , Daniel Milad

View Article and Find Full Text PDF

Similar Publications

Young Adults with Ischemic Stroke in Argentina: A National Multicenter Retrospective Registry Analysis (JACARANDA).

Int J Stroke

September 2025

Centro integral de Neurología Vascular, FLENI.

Fabio M Maximiliano Gonzalez , Juan Ignacio Lopez , Flavia Tamagnini , Pablo Bonardo , Norberto Cotti

Background: Young adults account for up to 15% of all ischemic strokes, yet data from Latin America remain scarce. Understanding their clinical profile and outcomes is essential to inform targeted interventions and public health strategies. We aimed to characterize demographics, vascular risk factors, stroke etiology, access to acute reperfusion therapies, and 90-day outcomes in Argentine patients aged 18-50 years with ischemic stroke.

View Article and Find Full Text PDF

Similar Publications

Risk factors for severe COVID-19 and development of a predictive model.

BMC Pulm Med

September 2025

North China University of Science and Technology Affiliated Hospital, Tangshan, Hebei, China.

Ling Zhang , Xinran Li , Ziyan Wang , Lei Zhao , Huixia Gao

A clinical case‒control study was conducted to identify risk factors for severe COVID-19 and to develop a predictive risk model to provide a reference for the dynamic assessment of the severity of disease in COVID-19 patients. A total of 410 patients with COVID-19 were included in the study, of whom 132 had severe or critical cases. The clinical data of the patients were collected, and the variables were subsequently screened via LASSO regression analysis and 10-fold cross-validation.

View Article and Find Full Text PDF

Similar Publications