Glaucoma Detection and Feature Identification via GPT-4V Fundus Image Analysis.

Ophthalmol Sci

Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology, Shiley Eye Institute, University of California, San Diego, La Jolla, California.

Published: November 2024


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Purpose: The aim is to assess GPT-4V's (OpenAI) diagnostic accuracy and its capability to identify glaucoma-related features compared to expert evaluations.

Design: Evaluation of multimodal large language models for reviewing fundus images in glaucoma.

Subjects: A total of 300 fundus images from 3 public datasets (ACRIMA, ORIGA, and RIM-One v3) that included 139 glaucomatous and 161 nonglaucomatous cases were analyzed.

Methods: Preprocessing ensured each image was centered on the optic disc. GPT-4's vision-preview model (GPT-4V) assessed each image for various glaucoma-related criteria: image quality, image gradability, cup-to-disc ratio, peripapillary atrophy, disc hemorrhages, rim thinning (by quadrant and clock hour), glaucoma status, and estimated probability of glaucoma. Each image was analyzed twice by GPT-4V to evaluate consistency in its predictions. Two expert graders independently evaluated the same images using identical criteria. Comparisons between GPT-4V's assessments, expert evaluations, and dataset labels were made to determine accuracy, sensitivity, specificity, and Cohen kappa.

Main Outcome Measures: The main parameters measured were the accuracy, sensitivity, specificity, and Cohen kappa of GPT-4V in detecting glaucoma compared with expert evaluations.

Results: GPT-4V successfully provided glaucoma assessments for all 300 fundus images across the datasets, although approximately 35% required multiple prompt submissions. GPT-4V's overall accuracy in glaucoma detection was slightly lower (0.68, 0.70, and 0.81, respectively) than that of expert graders (0.78, 0.80, and 0.88, for expert grader 1 and 0.72, 0.78, and 0.87, for expert grader 2, respectively), across the ACRIMA, ORIGA, and RIM-ONE datasets. In Glaucoma detection, GPT-4V showed variable agreement by dataset and expert graders, with Cohen kappa values ranging from 0.08 to 0.72. In terms of feature detection, GPT-4V demonstrated high consistency (repeatability) in image gradability, with an agreement accuracy of ≥89% and substantial agreement in rim thinning and cup-to-disc ratio assessments, although kappas were generally lower than expert-to-expert agreement.

Conclusions: GPT-4V shows promise as a tool in glaucoma screening and detection through fundus image analysis, demonstrating generally high agreement with expert evaluations of key diagnostic features, although agreement did vary substantially across datasets.

Financial Disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11773068PMC
http://dx.doi.org/10.1016/j.xops.2024.100667DOI Listing

Publication Analysis

Top Keywords

glaucoma detection
12
fundus images
12
expert graders
12
expert
9
glaucoma
8
gpt-4v
8
image
8
fundus image
8
image analysis
8
compared expert
8

Similar Publications

Objective: Laronidase is the first drug of enzyme replacement therapy approved for the treatment of mucopolysaccharidosis type I (MPS I). However, its adverse events (AEs) have not been investigated in real - world settings. The aim of this study was to investigate AEs associated with laronidase using the Food and Drug Administration Adverse Event Reporting System (FAERS).

View Article and Find Full Text PDF

Unlabelled: Standard Automated Perimetry (SAP) is the mainstay for monitoring glaucoma progression and has been accepted by the U.S. Food and Drug Administration (FDA) as a trial endpoint, but only under stringent criteria of ≥7 dB loss in five pre-specified test locations.

View Article and Find Full Text PDF

The assessment of the human visual field, a concept explored since ancient Greece, underwent a critical transformation in the 19th century with the advent of objective measurement techniques. Early methodologies concentrated on mapping the outer limits of vision, a practice known as perimetry. However, the focus soon shifted toward campimetry (although the name perimetry remained), which involves assessing defects within the central/paracentral visual field-a crucial development for diagnosing diseases such as glaucoma.

View Article and Find Full Text PDF

BackgroundSlit Guidance Ligand 2 (SLIT2) binds Roundabout (ROBO) guidance receptors to direct axon pathfinding and neuron migration during nervous system development. SLIT2 expression has previously been linked to dementia risk.ObjectiveTo study the association between SLIT2 expression in human vitreous humor and plasma samples and neurocognitive test scores in a cross-sectional cohort study utilizing a novel, highly-sensitive Meso Scale Discovery (MSD) assay for SLIT2 detection.

View Article and Find Full Text PDF

The increasing prevalence and severity of eye diseases worldwide underscore the urgent need for advanced diagnostic tools and interventions to address the growing burden on global public health. The study on eye disease classification holds significant relevance due to its potential impact on enhancing early detection, diagnosis, and treatment of various ocular conditions. Timely and accurate identification of eye diseases such as cataracts, glaucoma and diabetic retinopathy is crucial for preventing vision loss and improving overall patient outcomes.

View Article and Find Full Text PDF