Glaucoma Detection and Feature Identification via GPT-4V Fundus Image Analysis.

Jalil Jalili , Anuwat Jiravarnsirikul , Christopher Bowd , Benton Chuter , Akram Belghith , Michael H Goldbaum , Sally L Baxter , Robert N Weinreb , Linda M Zangwill , Mark Christopher

Ophthalmol Sci

Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology, Shiley Eye Institute, University of California, San Diego, La Jolla, California.

Published: November 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Purpose: The aim is to assess GPT-4V's (OpenAI) diagnostic accuracy and its capability to identify glaucoma-related features compared to expert evaluations.

Design: Evaluation of multimodal large language models for reviewing fundus images in glaucoma.

Subjects: A total of 300 fundus images from 3 public datasets (ACRIMA, ORIGA, and RIM-One v3) that included 139 glaucomatous and 161 nonglaucomatous cases were analyzed.

Methods: Preprocessing ensured each image was centered on the optic disc. GPT-4's vision-preview model (GPT-4V) assessed each image for various glaucoma-related criteria: image quality, image gradability, cup-to-disc ratio, peripapillary atrophy, disc hemorrhages, rim thinning (by quadrant and clock hour), glaucoma status, and estimated probability of glaucoma. Each image was analyzed twice by GPT-4V to evaluate consistency in its predictions. Two expert graders independently evaluated the same images using identical criteria. Comparisons between GPT-4V's assessments, expert evaluations, and dataset labels were made to determine accuracy, sensitivity, specificity, and Cohen kappa.

Main Outcome Measures: The main parameters measured were the accuracy, sensitivity, specificity, and Cohen kappa of GPT-4V in detecting glaucoma compared with expert evaluations.

Results: GPT-4V successfully provided glaucoma assessments for all 300 fundus images across the datasets, although approximately 35% required multiple prompt submissions. GPT-4V's overall accuracy in glaucoma detection was slightly lower (0.68, 0.70, and 0.81, respectively) than that of expert graders (0.78, 0.80, and 0.88, for expert grader 1 and 0.72, 0.78, and 0.87, for expert grader 2, respectively), across the ACRIMA, ORIGA, and RIM-ONE datasets. In Glaucoma detection, GPT-4V showed variable agreement by dataset and expert graders, with Cohen kappa values ranging from 0.08 to 0.72. In terms of feature detection, GPT-4V demonstrated high consistency (repeatability) in image gradability, with an agreement accuracy of ≥89% and substantial agreement in rim thinning and cup-to-disc ratio assessments, although kappas were generally lower than expert-to-expert agreement.

Conclusions: GPT-4V shows promise as a tool in glaucoma screening and detection through fundus image analysis, demonstrating generally high agreement with expert evaluations of key diagnostic features, although agreement did vary substantially across datasets.

Financial Disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11773068	PMC
http://dx.doi.org/10.1016/j.xops.2024.100667	DOI Listing

Publication Analysis

Top Keywords

glaucoma detection

fundus images

expert graders

expert

glaucoma

gpt-4v

image

fundus image

image analysis

compared expert

Similar Publications

Safety assessment of laronidase: real-world adverse event analysis based on the FDA adverse event reporting system (FAERS).

Front Pharmacol

August 2025

Department of Clinical Pharmacy, Meizhou People's Hospital (Huangtang Hospital), Meizhou, China.

Zhuomiao Lin , Junling Xue , Meiqing Yang , Xihui Yu , Jiahong Zhong

Objective: Laronidase is the first drug of enzyme replacement therapy approved for the treatment of mucopolysaccharidosis type I (MPS I). However, its adverse events (AEs) have not been investigated in real - world settings. The aim of this study was to investigate AEs associated with laronidase using the Food and Drug Administration Adverse Event Reporting System (FAERS).

View Article and Find Full Text PDF

Similar Publications

AI-GUIDED ENDPOINT SELECTION FOR NEUROPROTECTION TRIALS IN GLAUCOMA.

medRxiv

August 2025

Douglas R da Costa , Rafael Scherer , Swarup Swaminathan , Henry Tseng , Felipe A Medeiros

Unlabelled: Standard Automated Perimetry (SAP) is the mainstay for monitoring glaucoma progression and has been accepted by the U.S. Food and Drug Administration (FDA) as a trial endpoint, but only under stringent criteria of ≥7 dB loss in five pre-specified test locations.

View Article and Find Full Text PDF

Similar Publications

The Evolution of Visual Field Testing: A 40-Year Perspective on Modern Perimetry in Glaucoma.

Ophthalmol Glaucoma

September 2025

Department of Ophthalmology, Columbia University Irving Medical Center, New York, New York. Electronic address:

Anfei Li , C Gustavo De Moraes

The assessment of the human visual field, a concept explored since ancient Greece, underwent a critical transformation in the 19th century with the advent of objective measurement techniques. Early methodologies concentrated on mapping the outer limits of vision, a practice known as perimetry. However, the focus soon shifted toward campimetry (although the name perimetry remained), which involves assessing defects within the central/paracentral visual field-a crucial development for diagnosing diseases such as glaucoma.

View Article and Find Full Text PDF

Similar Publications

The association between SLIT2 in human vitreous humor and plasma and neurocognitive test scores.

J Alzheimers Dis

September 2025

Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA.

Sara I Shoushtari , Easton Liaw , Sreevardhan Alluri , Zahra Sheikh , Sudhir Kumar

BackgroundSlit Guidance Ligand 2 (SLIT2) binds Roundabout (ROBO) guidance receptors to direct axon pathfinding and neuron migration during nervous system development. SLIT2 expression has previously been linked to dementia risk.ObjectiveTo study the association between SLIT2 expression in human vitreous humor and plasma samples and neurocognitive test scores in a cross-sectional cohort study utilizing a novel, highly-sensitive Meso Scale Discovery (MSD) assay for SLIT2 detection.

View Article and Find Full Text PDF

Similar Publications

Optimizing vision care: Dual path network model in eye disease classification.

Comput Biol Med

September 2025

Department of Mathematics, Faculty of Education, Kafkas University, Kars, Turkey. Electronic address:

Raji Elsa Varghese , S Immanuel Alex Pandian , K Martin Sagayam , J Anitha , Kottakkaran Sooppy Nisar

The increasing prevalence and severity of eye diseases worldwide underscore the urgent need for advanced diagnostic tools and interventions to address the growing burden on global public health. The study on eye disease classification holds significant relevance due to its potential impact on enhancing early detection, diagnosis, and treatment of various ocular conditions. Timely and accurate identification of eye diseases such as cataracts, glaucoma and diabetic retinopathy is crucial for preventing vision loss and improving overall patient outcomes.

View Article and Find Full Text PDF

Similar Publications