Is Information About Musculoskeletal Malignancies From Large Language Models or Web Resources at a Suitable Reading Level for Patients?

Paul G Guirguis , Mark P Youssef , Ankit Punreddy , Mina Botros , Mattie Raiford , Susan McDowell

Clin Orthop Relat Res

Department of Orthopaedics and Physical Performance, University of Rochester Medical Center, Rochester, NY, USA.

Published: February 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Background: Patients and caregivers may experience immense distress when receiving the diagnosis of a primary musculoskeletal malignancy and subsequently turn to internet resources for more information. It is not clear whether these resources, including Google and ChatGPT, offer patients information that is readable, a measure of how easy text is to understand. Since many patients turn to Google and artificial intelligence resources for healthcare information, we thought it was important to ascertain whether the information they find is readable and easy to understand. The objective of this study was to compare readability of Google search results and ChatGPT answers to frequently asked questions and assess whether these sources meet NIH recommendations for readability.

Questions/purposes: (1) What is the readability of ChatGPT-3.5 as a source of patient information for the three most common primary bone malignancies compared with top online resources from Google search? (2) Do ChatGPT-3.5 responses and online resources meet NIH readability guidelines for patient education materials?

Methods: This was a cross-sectional analysis of the 12 most common online questions about osteosarcoma, chondrosarcoma, and Ewing sarcoma. To be consistent with other studies of similar design that utilized national society frequently asked questions lists, questions were selected from the American Cancer Society and categorized based on content, including diagnosis, treatment, and recovery and prognosis. Google was queried using all 36 questions, and top responses were recorded. Author types, such as hospital systems, national health organizations, or independent researchers, were recorded. ChatGPT-3.5 was provided each question in independent queries without further prompting. Responses were assessed with validated reading indices to determine readability by grade level. An independent t-test was performed with significance set at p < 0.05.

Results: Google (n = 36) and ChatGPT-3.5 (n = 36) answers were recorded, 12 for each of the three cancer types. Reading grade levels based on mean readability scores were 11.0 ± 2.9 and 16.1 ± 3.6, respectively. This corresponds to the eleventh grade reading level for Google and a fourth-year undergraduate student level for ChatGPT-3.5. Google answers were more readable across all individual indices, without differences in word count. No difference in readability was present across author type, question category, or cancer type. Of 72 total responses across both search modalities, none met NIH readability criteria at the sixth-grade level.

Conclusion: Google material was presented at a high school reading level, whereas ChatGPT-3.5 was at an undergraduate reading level. The readability of both resources was inadequate based on NIH recommendations. Improving readability is crucial for better patient understanding during cancer treatment. Physicians should assess patients' needs, offer them tailored materials, and guide them to reliable resources to prevent reliance on online information that is hard to understand.

Level Of Evidence: Level III, prognostic study.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11753740	PMC
http://dx.doi.org/10.1097/CORR.0000000000003263	DOI Listing

Publication Analysis

Top Keywords

reading level

google

readability

resources

frequently asked

asked questions

meet nih

nih recommendations

online resources

nih readability

Similar Publications

Evaluation of information provided by artificial intelligence chatbots on extraoral maxillofacial prostheses.

J Prosthet Dent

September 2025

Professor, Department of Prosthodontics, Faculty of Dentistry, Gazi University, Ankara, Turkey.

Nuran Özyemişci , Bilge Turhan Bal , Merve Bankoğlu Güngör , Esra Kaynak Öztürk , Ayşegül Canvar

Statement Of Problem: Despite advances in artificial intelligence (AI), the quality, reliability, and understandability of health-related information provided by chatbots is still a question mark. Furthermore, studies on maxillofacial prosthesis (MP) information from AI chatbots are lacking.

Purpose: The purpose of this study was to assess and compare the reliability, quality, readability, and similarity of responses to MP-related questions generated by 4 different chatbots.

View Article and Find Full Text PDF

Similar Publications

Evaluating Posterior Vitreous Detachment Annotation Consistency on OCT Scans in Patients with Disease of the Vitreomacular Interface.

Retina

September 2025

Department of Ophthalmology, Inselspital, Bern University Hospital, University of Bern, Freiburgstrasse 15, CH-3010.

Lorenzo Ferro Desideri , Nina Eldridge , Nicola Sagurski , Jonathan Brenneisen , Florian Heussen

Purpose: To evaluate inter-grader variability in posterior vitreous detachment (PVD) classification in patients with epiretinal membrane (ERM) and macular hole (MH) on spectral-domain optical coherence tomography (SD-OCT) and identify challenges in defining a reliable ground truth for artificial intelligence (AI)-based tools.

Methods: A total of 437 horizontal SD-OCT B-scans were retrospectively selected and independently annotated by six experienced ophthalmologists adopting four categories: 'full PVD', 'partial PVD', 'no PVD', and 'ungradable'. Inter-grader agreement was assessed using pairwise Cohen's kappa scores.

View Article and Find Full Text PDF

Similar Publications

Brain activation for language and its relationship to cognitive and linguistic measures.

Cereb Cortex

August 2025

Faculty of Psychology and Education Science, Department of Psychology, University of Geneva, Chemin des Mines 9, Geneva, 1202, Switzerland.

Irene Balboni , Alessandra Rampinini , Olga Kepinska , Raphael Berthele , Narly Golestani

Language learning and use relies on domain-specific, domain-general cognitive and sensory-motor functions. Using fMRI during story listening and behavioral tests, we investigated brain-behavior associations between linguistic and non-linguistic measures in individuals with varied multilingual experience and reading skills, including typical reading participants (TRs) and dyslexic readers (DRs). Partial Least Square Correlation revealed a main component linking cognitive, linguistic, and phonological measures to amodal/associative brain areas.

View Article and Find Full Text PDF

Similar Publications

YOLOv12 Algorithm-Aided Detection and Classification of Lateral Malleolar Avulsion Fracture and Subfibular Ossicle Based on CT Images: A Multicenter Study.

JMIR Med Inform

September 2025

Department of Radiology, Air Force Medical Center, Air Force Medical University, Fucheng Road 30, Haidian District, Beijing, CN.

Jiayi Liu , Peng Sun , Yousheng Yuan , Zihan Chen , Ke Tian

Background: Lateral malleolar avulsion fracture (LMAF) and subfibular ossicle (SFO) are distinct entities that both present as small bone fragments near the lateral malleolus on imaging, yet require different treatment strategies. Clinical and radiological differentiation is challenging, which can impede timely and precise management. On imaging, magnetic resonance imaging (MRI) is the diagnostic gold standard for differentiating LMAF from SFO, whereas radiological differentiation on computed tomography (CT) alone is challenging in routine practice.

View Article and Find Full Text PDF

Similar Publications

Racial and Disaggregated Ethnic Disparities of Blood Pressure Control in Community Health Centers.

J Gen Intern Med

September 2025

Department of Family Medicine, Oregon Health & Science University, 3181 SW Sam Jackson Park Rd, Portland, OR, USA.

David Boston , Jun Hwang , Jennifer A Lucas , Miguel Marino , Zoe Larson

Background: Hypertension is the most prevalent reversible risk for cardiovascular morbidity and mortality. Blood pressure (BP) control is poor nationally and varies by race/ethnicity, and there is minimal understanding of the impact of country of origin.

Objective: To examine racial/ethnic disparities in BP control among high-risk patients and among Latino patients disaggregated by country of origin.

View Article and Find Full Text PDF

Similar Publications