Evaluating ChatGPT's Accuracy and Readability in Responding to Common Ophthalmology Questions.

Parsa Riazi Esfahani , Jason Ward , Aidan Yong , Tri Brian Nguyen , Akshay J Reddy , Sina Sobhani , Dalbert Chen , Marib Akanda , Shazia Sheikh

Cureus

Medicine, California University of Science and Medicine, Colton, USA.

Published: July 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Objectives: Eye-related conditions are a prevalent issue that continues to grow worldwide, affecting the sight of at least 2.2 billion individuals globally. Many patients may have questions or concerns that they bring to the internet before their healthcare provider, which can impact their health behavior. With the popularity of large language model (LLM)-based artificial intelligence (AI) chat platforms, like ChatGPT, there needs to be a better understanding of the suitability of their generated content. We aim to evaluate ChatGPT for the accuracy, comprehensiveness, and readability of its responses to ophthalmology-related medical inquiries.

Methodology: Twenty-two ophthalmology patient questions were generated based on commonly searched symptoms on Google Trends and used as inputs on ChatGPT. Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL) formulas were used to evaluate response readability. Two English-speaking, board-certified ophthalmologists evaluated the accuracy, comprehensiveness, and clarity of the responses as proxies for appropriateness. Other validated tools, including QUEST, DISCERN, and an urgency scale, were used for additional quality metrics. Responses were analyzed using descriptive statistics and comparative tests. Results: All responses scored a 2.0 for QUEST Tone and 1.0 for Complementarity. DISCERN Uncertainty had a mean of 3.86 ± 0.48, with no responses receiving a 5. Urgency to seek care scores averaged 2.45 ± 0.60, with only the narrow-angle glaucoma response prompting an ambulance call. Readability scores resulted in a mean FRE of 45.3 ± 9.98 and FKGL of 10.1 ± 1.74. These quality assessment scores showed no significant differences between categories of conditions. The ophthalmologists' reviews rated 15/22 (68.18%) of responses as appropriate. The mean scores for accuracy, comprehensiveness, and clarity were 4.41 ± 0.73, 4.89 ± 0.32, and 4.55 ± 0.63, respectively, with comprehensiveness ranking significantly higher than the other aspects (< 0.01). The responses for glaucoma and cataract had the lowest appropriateness ratings.

Conclusions: ChatGPT generally demonstrated appropriate responses to common ophthalmology questions, with high ratings for comprehensiveness, clarity, and support for medical professional follow-up. Performance did vary by conditions, with weaker appropriateness in responses related to glaucoma and cataract.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12349890	PMC
http://dx.doi.org/10.7759/cureus.87920	DOI Listing

Publication Analysis

Top Keywords

accuracy comprehensiveness

comprehensiveness clarity

responses

common ophthalmology

ophthalmology questions

responses glaucoma

glaucoma cataract

comprehensiveness

evaluating chatgpt's

accuracy

Similar Publications

A comprehensive panel of testing for amyloidosis.

J Histotechnol

September 2025

Department of Pathology, Peking University Third Hospital, Beijing, China.

Hong Tang , Feifei Zhao , Yingwei Zhu , Rongrong Xu , Huiqing Yuan

Amyloidosis encompasses a spectrum of rare disorders characterized by extracellular amyloid deposition. Achieving an accurate early diagnosis of systemic amyloidosis necessitates biopsy-specific pathological evaluation. Formalin-fixed, paraffin-embedded liver biopsy specimens were examined using Congo red staining, electron microscopy, immunohistochemistry (IHC), immunofluorescence, and Congo red-assisted laser microdissection with mass spectrometry (LMD/MS).

View Article and Find Full Text PDF

Similar Publications

Ultrasonographic Analysis of Site-Specific Plantar Skin Thickness for Melanoma Staging and Excision.

Clin Anat

September 2025

Division in Anatomy and Developmental Biology, Department of Oral Biology, Human Identification Research Institute, BK21 FOUR Project, Yonsei University College of Dentistry, Seoul, South Korea.

Jiin Kim , Subin Hur , Kyu-Lim Lee , Hee-Jin Kim

Plantar melanomas present unique diagnostic and surgical challenges owing to substantial regional variations in skin thickness. Although the Breslow thickness remains the primary criterion for staging and surgical excision, its application on plantar melanoma is complicated by the inherent thickness of the glabrous plantar epidermis, which may lead to tumor depth overestimation. Accurate assessment of plantar skin thickness is essential for optimizing staging accuracy and refining surgical margins.

View Article and Find Full Text PDF

Similar Publications

Machine learning predicts improvement of functional outcomes in spinal cord injury patients after inpatient rehabilitation.

Front Rehabil Sci

August 2025

Department of Neurosurgery, David Geffen School of Medicine, University of California, Los Angeles, CA, United States.

Mohammad Rasoolinejad , Irene Say , Peter B Wu , Xinran Liu , Yan Zhou

Introduction: Spinal cord injury (SCI) presents a significant burden to patients, families, and the healthcare system. The ability to accurately predict functional outcomes for SCI patients is essential for optimizing rehabilitation strategies, guiding patient and family decision making, and improving patient care.

Methods: We conducted a retrospective analysis of 589 SCI patients admitted to a single acute rehabilitation facility and used the dataset to train advanced machine learning algorithms to predict patients' rehabilitation outcomes.

View Article and Find Full Text PDF

Similar Publications

Prognostic Value of Biomarkers in Chronic Obstructive Pulmonary Disease: A Comprehensive Review.

Int J Chron Obstruct Pulmon Dis

September 2025

The First Clinical Medical College of Lanzhou University, Lanzhou, People's Republic of China.

Yunpeng Xu , Lei Zhang , Lei Zhu , Zi Yang , Xue Bai

Chronic Obstructive Pulmonary Disease (COPD) is a prevalent chronic respiratory disorder characterized by airway inflammation and irreversible airflow limitation. Its marked heterogeneity and complexity pose significant challenges to traditional clinical assessments in terms of prognostic prediction and personalized management. In recent years, the exploration of biomarkers has opened new avenues for the precise evaluation of COPD, particularly through multi-biomarker prediction models and integrative multimodal data strategies, which have substantially improved the accuracy and reliability of prognostic assessments.

View Article and Find Full Text PDF

Similar Publications

Drought adaptation index (DAI) based on BLUP as a selection approach for drought-resilient switchgrass germplasm.

Front Genet

August 2025

Center for Applied Genetic Technologies, University of Georgia, Athens, GA, United States.

Shiva Om Makaju , Hari Bahadur Chhetri , Chanaka Roshan Abeyratne , Mirko Pavicic , Hari Poudel

This study introduces a Drought Adaptation Index (DAI), derived from Best Linear Unbiased Prediction (BLUP), as a method to assess drought resilience in switchgrass ( L.). A panel of 404 genotypes was evaluated under drought-stressed (CV) and well-watered (UC) conditions over four consecutive years (2019-2022).

View Article and Find Full Text PDF

Similar Publications