Multimodal LLMs for retinal disease diagnosis via OCT: few-shot versus single-shot learning.

Reem Agbareia , Mahmud Omar , Ofira Zloto , Benjamin S Glicksberg , Girish N Nadkarni , Eyal Klang

Ther Adv Ophthalmol

The Windreich Department of Artificial Intelligence and Human Health, Mount Sinai Medical Center, New York, NY, USA.

Published: May 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Background And Aim: Multimodal large language models (LLMs) have shown potential in processing both text and image data for clinical applications. This study evaluated their diagnostic performance in identifying retinal diseases from optical coherence tomography (OCT) images.

Methods: We assessed the diagnostic accuracy of GPT-4o and Claude Sonnet 3.5 using two public OCT datasets (OCTID, OCTDL) containing expert-labeled images of four pathological conditions and normal retinas. Both models were tested using single-shot and few-shot prompts, with an overall of 3088 models' API calls. Statistical analyses were performed to evaluate differences in overall and condition-specific performance.

Results: GPT-4o's accuracy improved from 56.29% with single-shot prompts to 73.08% with few-shot prompts ( < 0.001). Similarly, Claude Sonnet 3.5 increased from 40.03% to 70.98% using the same approach ( < 0.001). Condition-specific analyses revealed similar trends, with absolute improvements ranging from 2% to 64%. These findings were consistent across the validation dataset.

Conclusion: Few-shot prompted multimodal LLMs show promise for clinical integration, particularly in identifying normal retinas, which could help streamline referral processes in primary care. While these models fall short of the diagnostic accuracy reported in established deep learning literature, they offer simple, effective tools for assisting in routine retinal disease diagnosis. Future research should focus on further validation and integrating clinical text data with imaging.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12093016	PMC
http://dx.doi.org/10.1177/25158414251340569	DOI Listing

Publication Analysis

Top Keywords

few-shot prompts

multimodal llms

llms retinal

retinal disease

disease diagnosis

diagnosis oct

oct few-shot

few-shot versus

versus single-shot

single-shot learning

Similar Publications

Implementing a Resource-Light and Low-Code Large Language Model System for Information Extraction from Mammography Reports: A Pilot Study.

J Imaging Inform Med

September 2025

Department of Diagnostic, Interventional and Pediatric Radiology (DIPR), Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland.

Fabio Dennstädt , Simon Fauser , Nikola Cihoric , Max Schmerder , Paolo Lombardo

Large language models (LLMs) have been successfully used for data extraction from free-text radiology reports. Most current studies were conducted with LLMs accessed via an application programming interface (API). We evaluated the feasibility of using open-source LLMs, deployed on limited local hardware resources for data extraction from free-text mammography reports, using a common data element (CDE)-based structure.

View Article and Find Full Text PDF

Similar Publications

PM: A new prompting multi-modal model paradigm for few-shot medical image classification.

Comput Methods Programs Biomed

September 2025

Key Laboratory of Social Computing and Cognitive Intelligence (Ministry of Education), Dalian University of Technology, Dalian, 116024, China; School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China. Electronic address:

Zhenwei Wang , Qiule Sun , Bingbing Zhang , Pengfei Wang , Jianxin Zhang

Background And Objective: Few-shot learning has emerged as a key technological solution to address challenges such as limited data and the difficulty of acquiring annotations in medical image classification. However, relying solely on a single image modality is insufficient to capture conceptual categories. Therefore, medical image classification requires a comprehensive approach to capture conceptual category information that aids in the interpretation of image content.

View Article and Find Full Text PDF

Similar Publications

Guideline adherence in surgical decisions for T1 colorectal cancer after endoscopic resection: large language models vs clinicians.

Int J Surg

September 2025

Digestive Endoscopy Center, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Shanghai, China.

Liangtang Zeng , Cao Qinxing , Junyuan Deng , Junnan Hu , Minghui Pang

Background: Patients with T1 colorectal cancer (CRC) often show poor adherence to guideline-recommended treatment strategies after endoscopic resection. To address this challenge and improve clinical decision-making, this study aims to compare the accuracy of surgical management recommendations between large language models (LLMs) and clinicians.

Methods: This retrospective study enrolled 202 patients with T1 CRC who underwent endoscopic resection at three hospitals.

View Article and Find Full Text PDF

Similar Publications

Out-of-the-Box Large Language Models for Detecting and Classifying Critical Findings in Radiology Reports Using Various Prompt Strategies.

AJR Am J Roentgenol

September 2025

Department of Radiology, Stanford University, Stanford, CA, USA.

Ish A Talati , Juan M Zambrano Chaves , Avisha Das , Imon Banerjee , Daniel L Rubin

The increasing complexity and volume of radiology reports present challenges for timely critical findings communication. To evaluate the performance of two out-of-the-box LLMs in detecting and classifying critical findings in radiology reports using various prompt strategies. The analysis included 252 radiology reports of varying modalities and anatomic regions extracted from the MIMIC-III database, divided into a prompt engineering tuning set of 50 reports, a holdout test set of 125 reports, and a pool of 77 remaining reports used as examples for few-shot prompting.

View Article and Find Full Text PDF

Similar Publications

Performance of vision language models for optic disc swelling identification on fundus photographs.

Front Digit Health

August 2025

Department of Ophthalmology, Stanford University, Palo Alto, CA, United States.

Kelvin Zhenghao Li , Tuyet Thao Nguyen , Heather E Moss

Introduction: Vision language models (VLMs) combine image analysis capabilities with large language models (LLMs). Because of their multimodal capabilities, VLMs offer a clinical advantage over image classification models for the diagnosis of optic disc swelling by allowing a consideration of clinical context. In this study, we compare the performance of non-specialty-trained VLMs with different prompts in the classification of optic disc swelling on fundus photographs.

View Article and Find Full Text PDF

Similar Publications