Comparing the Accuracy and Readability of Glaucoma-related Question Responses and Educational Materials by Google and ChatGPT.

Samuel A Cohen , Ann C Fisher , Benjamin Y Xu , Brian J Song

J Curr Glaucoma Pract

Department of Ophthalmology, USC Roski Eye Institute, Los Angeles, California, United States.

Published: October 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Aim And Background: Patients are increasingly turning to the internet to learn more about their ocular disease. In this study, we sought (1) to compare the accuracy and readability of Google and ChatGPT responses to patients' glaucoma-related frequently asked questions (FAQs) and (2) to evaluate ChatGPT's capacity to improve glaucoma patient education materials by accurately reducing the grade level at which they are written.

Materials And Methods: We executed a Google search to identify the three most common FAQs related to 10 search terms associated with glaucoma diagnosis and treatment. Each of the 30 FAQs was inputted into both Google and ChatGPT and responses were recorded. The accuracy of responses was evaluated by three glaucoma specialists while readability was assessed using five validated readability indices. Subsequently, ChatGPT was instructed to generate patient education materials at specific reading levels to explain seven glaucoma procedures. The accuracy and readability of procedural explanations were measured.

Results: ChatGPT responses to glaucoma FAQs were significantly more accurate than Google responses (97 vs 77% accuracy, respectively, < 0.001). ChatGPT responses were also written at a significantly higher reading level (grade 14.3 vs 9.4, respectively, < 0.001). When instructed to revise glaucoma procedural explanations to improve understandability, ChatGPT reduced the average reading level of educational materials from grade 16.6 (college level) to grade 9.4 (high school level) ( < 0.001) without reducing the accuracy of procedural explanations.

Conclusion: ChatGPT is more accurate than Google search when responding to glaucoma patient FAQs. ChatGPT successfully reduced the reading level of glaucoma procedural explanations without sacrificing accuracy, with implications for the future of customized patient education for patients with varying health literacy.

Clinical Significance: Our study demonstrates the utility of ChatGPT for patients seeking information about glaucoma and for physicians when creating unique patient education materials at reading levels that optimize understanding by patients. An enhanced patient understanding of glaucoma may lead to informed decision-making and improve treatment compliance.

How To Cite This Article: Cohen SA, Fisher AC, Xu BY, Comparing the Accuracy and Readability of Glaucoma-related Question Responses and Educational Materials by Google and ChatGPT. J Curr Glaucoma Pract 2024;18(3):110-116.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11576343	PMC
http://dx.doi.org/10.5005/jp-journals-10078-1448	DOI Listing

Publication Analysis

Top Keywords

accuracy readability

google chatgpt

chatgpt responses

patient education

educational materials

education materials

procedural explanations

reading level

chatgpt

glaucoma

Similar Publications

Machine learning for myocarditis diagnosis using cardiovascular magnetic resonance: a systematic review, diagnostic test accuracy meta-analysis, and comparison with human physicians.

Int J Cardiovasc Imaging

September 2025

Klinikum Fürth, Friedrich-Alexander-University Erlangen- Nürnberg, Fürth, Germany.

Paweł Łajczak , Oguz Kagan Sahin , Jakub Matyja , Luis Rene Puglla Sanchez , Iqbal Farhan Sayudo

Myocarditis is an inflammation of heart tissue. Cardiovascular magnetic resonance imaging (CMR) has emerged as an important non-invasive imaging tool for diagnosing myocarditis, however, interpretation remains a challenge for novice physicians. Advancements in machine learning (ML) models have further improved diagnostic accuracy, demonstrating good performance.

View Article and Find Full Text PDF

Similar Publications

Comparative performance of neurosurgery-specific, peer-reviewed versus general AI chatbots in bilingual board examinations: evaluating accuracy, consistency, and error minimization strategies.

Acta Neurochir (Wien)

September 2025

Department of Neurosurgery, Istinye University, Istanbul, Turkey.

Mahmut Çamlar , Umut Tan Sevgi , Gökberk Erol , Furkan Karakaş , Yücel Doğruel

Background: Recent studies suggest that large language models (LLMs) such as ChatGPT are useful tools for medical students or residents when preparing for examinations. These studies, especially those conducted with multiple-choice questions, emphasize that the level of knowledge and response consistency of the LLMs are generally acceptable; however, further optimization is needed in areas such as case discussion, interpretation, and language proficiency. Therefore, this study aimed to evaluate the performance of six distinct LLMs for Turkish and English neurosurgery multiple-choice questions and assess their accuracy and consistency in a specialized medical context.

View Article and Find Full Text PDF

Similar Publications

Comparing Performance of Large Language Model-Based Tools on Patient-Driven Glaucoma Inquiries.

J Glaucoma

September 2025

Harvard Medical School, Boston, MA.

Dhruva Gupta , Sarah L Wagner , Alexandra G Castillejos Ellenthal , Andrew W Gross , Edward S Lu

Purpose: Large language models (LLMs) can assist patients who seek medical knowledge online to guide their own glaucoma care. Understanding the differences in LLM performance on glaucoma-related questions can inform patients about the best resources to obtain relevant information.

Methods: This cross-sectional study evaluated the accuracy, comprehensiveness, quality, and readability of LLM-generated responses to glaucoma inquiries.

View Article and Find Full Text PDF

Similar Publications

Assessing the ability of large language models to simplify lumbar spine imaging reports into patient-facing text: a pilot study of GPT-4.

Skeletal Radiol

September 2025

Department of Orthopaedic Surgery, Northwestern University, Chicago, IL, USA.

Rushmin Khazanchi , Austin R Chen , Parth Desai , Daniel Herrera , Jacob R Staub

Objective: To assess the ability of large language models (LLMs) to accurately simplify lumbar spine magnetic resonance imaging (MRI) reports.

Materials And Methods: Patients who underwent lumbar decompression and/or fusion surgery in 2022 at one tertiary academic medical center were queried using appropriate CPT codes. We then identified all patients with a preoperative ICD diagnosis of lumbar spondylolisthesis and extracted the latest preoperative spine MRI radiology report text.

View Article and Find Full Text PDF

Similar Publications

Comparing physician and artificial intelligence chatbot responses to posthysterectomy questions posted to a public social media forum.

AJOG Glob Rep

August 2025

Department of Obstetrics, Gynecology & Women's Health, University of Hawaii, Honolulu, HI (Kho).

Shadae K Beale , Natalie Cohen , Beatrice Secheli , Donald McIntire , Kimberly A Kho

Background: Within public online forums, patients often seek reassurance and guidance from the community regarding postoperative symptoms and expectations, and when to seek medical assistance. Others are using artificial intelligence in the form of online search engines or chatbots such as ChatGPT or Perplexity. Artificial intelligence chatbot assistants have been growing in popularity; however, clinicians may be hesitant to use them because of concerns about accuracy.

View Article and Find Full Text PDF

Similar Publications