Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Objectives: Eye-related conditions are a prevalent issue that continues to grow worldwide, affecting the sight of at least 2.2 billion individuals globally. Many patients may have questions or concerns that they bring to the internet before their healthcare provider, which can impact their health behavior. With the popularity of large language model (LLM)-based artificial intelligence (AI) chat platforms, like ChatGPT, there needs to be a better understanding of the suitability of their generated content. We aim to evaluate ChatGPT for the accuracy, comprehensiveness, and readability of its responses to ophthalmology-related medical inquiries.

Methodology: Twenty-two ophthalmology patient questions were generated based on commonly searched symptoms on Google Trends and used as inputs on ChatGPT. Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL) formulas were used to evaluate response readability. Two English-speaking, board-certified ophthalmologists evaluated the accuracy, comprehensiveness, and clarity of the responses as proxies for appropriateness. Other validated tools, including QUEST, DISCERN, and an urgency scale, were used for additional quality metrics. Responses were analyzed using descriptive statistics and comparative tests.  Results: All responses scored a 2.0 for QUEST Tone and 1.0 for Complementarity. DISCERN Uncertainty had a mean of 3.86 ± 0.48, with no responses receiving a 5. Urgency to seek care scores averaged 2.45 ± 0.60, with only the narrow-angle glaucoma response prompting an ambulance call. Readability scores resulted in a mean FRE of 45.3 ± 9.98 and FKGL of 10.1 ± 1.74. These quality assessment scores showed no significant differences between categories of conditions. The ophthalmologists' reviews rated 15/22 (68.18%) of responses as appropriate. The mean scores for accuracy, comprehensiveness, and clarity were 4.41 ± 0.73, 4.89 ± 0.32, and 4.55 ± 0.63, respectively, with comprehensiveness ranking significantly higher than the other aspects (< 0.01). The responses for glaucoma and cataract had the lowest appropriateness ratings.

Conclusions: ChatGPT generally demonstrated appropriate responses to common ophthalmology questions, with high ratings for comprehensiveness, clarity, and support for medical professional follow-up. Performance did vary by conditions, with weaker appropriateness in responses related to glaucoma and cataract.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12349890PMC
http://dx.doi.org/10.7759/cureus.87920DOI Listing

Publication Analysis

Top Keywords

accuracy comprehensiveness
12
comprehensiveness clarity
12
responses
9
common ophthalmology
8
ophthalmology questions
8
responses glaucoma
8
glaucoma cataract
8
comprehensiveness
5
evaluating chatgpt's
4
accuracy
4

Similar Publications

Amyloidosis encompasses a spectrum of rare disorders characterized by extracellular amyloid deposition. Achieving an accurate early diagnosis of systemic amyloidosis necessitates biopsy-specific pathological evaluation. Formalin-fixed, paraffin-embedded liver biopsy specimens were examined using Congo red staining, electron microscopy, immunohistochemistry (IHC), immunofluorescence, and Congo red-assisted laser microdissection with mass spectrometry (LMD/MS).

View Article and Find Full Text PDF

Ultrasonographic Analysis of Site-Specific Plantar Skin Thickness for Melanoma Staging and Excision.

Clin Anat

September 2025

Division in Anatomy and Developmental Biology, Department of Oral Biology, Human Identification Research Institute, BK21 FOUR Project, Yonsei University College of Dentistry, Seoul, South Korea.

Plantar melanomas present unique diagnostic and surgical challenges owing to substantial regional variations in skin thickness. Although the Breslow thickness remains the primary criterion for staging and surgical excision, its application on plantar melanoma is complicated by the inherent thickness of the glabrous plantar epidermis, which may lead to tumor depth overestimation. Accurate assessment of plantar skin thickness is essential for optimizing staging accuracy and refining surgical margins.

View Article and Find Full Text PDF

Introduction: Spinal cord injury (SCI) presents a significant burden to patients, families, and the healthcare system. The ability to accurately predict functional outcomes for SCI patients is essential for optimizing rehabilitation strategies, guiding patient and family decision making, and improving patient care.

Methods: We conducted a retrospective analysis of 589 SCI patients admitted to a single acute rehabilitation facility and used the dataset to train advanced machine learning algorithms to predict patients' rehabilitation outcomes.

View Article and Find Full Text PDF

Chronic Obstructive Pulmonary Disease (COPD) is a prevalent chronic respiratory disorder characterized by airway inflammation and irreversible airflow limitation. Its marked heterogeneity and complexity pose significant challenges to traditional clinical assessments in terms of prognostic prediction and personalized management. In recent years, the exploration of biomarkers has opened new avenues for the precise evaluation of COPD, particularly through multi-biomarker prediction models and integrative multimodal data strategies, which have substantially improved the accuracy and reliability of prognostic assessments.

View Article and Find Full Text PDF

This study introduces a Drought Adaptation Index (DAI), derived from Best Linear Unbiased Prediction (BLUP), as a method to assess drought resilience in switchgrass ( L.). A panel of 404 genotypes was evaluated under drought-stressed (CV) and well-watered (UC) conditions over four consecutive years (2019-2022).

View Article and Find Full Text PDF