98%
921
2 minutes
20
Background: With the development of artificial intelligence, obtaining patient-centered medical information through large language models (LLMs) is crucial for patient education. However, existing digital resources in online health care have heterogeneous quality, and the reliability and readability of content generated by various AI models need to be evaluated to meet the needs of patients with different levels of cultural literacy.
Objective: This study aims to compare the accuracy and readability of different LLMs in providing medical information related to gynecomastia, and explore the most promising science education tools in practical clinical applications.
Methods: This study selected 10 most frequently searched questions about gynecomastia from PubMed and Google Trends. Responses were generated using 3 LLMs (DeepSeek-R1, OpenAI-O3, Claude-4-Sonnet), with text quality assessed using the DISCERN-AI and PEMAT-AI scales. Text readability and legibility were comprehensively evaluated through metrics including word count, syllable count, Flesch-Kincaid Grade Level (FKGL), Flesch Kincaid Reading Ease (FKRE), SMOG index, and Automated Readability Index (ARI).
Results: In terms of quality evaluation, among the 10 items of the DISCERN-AI scale, only the overall content quality score showed a statistically significant difference (P = 0.001), with DeepSeek-R1 demonstrating the best performance at a median score of 5 (5,5). Regarding readability, DeepSeek-R1 exhibited the highest average word count and syllable count, both with P-values of 0.000. The 3 models showed no significant differences in FKGL, FKRE, or automatic readability indices. Specifically, the averaged FKGL scores of DeepSeek-R1 was 14.08, OpenAI-O3 was 14.1, and Claude-4-sonnet was 13.31. The SOMG evaluation revealed that Claude-4-sonnet demonstrated the strongest readability, the average value is 11 with a P-value of 0.028.
Conclusion: DeepSeek-R1 demonstrated the highest overall quality in content generation, followed by Claude-4-sonnet. Evaluations using FKGL, SMOG index, and ARI all indicated that Claude-4-sonnet exhibited the best readability. Given that improvements in quality and readability can enhance patient engagement and reduce anxiety, these 2 models should be prioritized for patient education applications. Future efforts should focus on integrating these advantages to develop more reliable large-scale medical language models.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1097/SCS.0000000000011930 | DOI Listing |
J Craniofac Surg
September 2025
Department of Breast Plastic Surgery, Plastic Surgery Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shijingshan, Beijing, China.
Background: With the development of artificial intelligence, obtaining patient-centered medical information through large language models (LLMs) is crucial for patient education. However, existing digital resources in online health care have heterogeneous quality, and the reliability and readability of content generated by various AI models need to be evaluated to meet the needs of patients with different levels of cultural literacy.
Objective: This study aims to compare the accuracy and readability of different LLMs in providing medical information related to gynecomastia, and explore the most promising science education tools in practical clinical applications.
JCO Clin Cancer Inform
September 2025
USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA.
Purpose: To evaluate a generative artificial intelligence (GAI) framework for creating readable lay abstracts and summaries (LASs) of urologic oncology research, while maintaining accuracy, completeness, and clarity, for the purpose of assessing their comprehension and perception among patients and caregivers.
Methods: Forty original abstracts (OAs) on prostate, bladder, kidney, and testis cancers from leading journals were selected. LASs were generated using a free GAI tool, with three versions per abstract for consistency.
Rev Bras Enferm
September 2025
Universidade Católica de Pernambuco. Recife, Pernambuco, Brazil.
Objectives: to develop a digital educational technology on LGBT-phobic bullying, in the form of a comic book, for health education among school-aged adolescents.
Methods: a methodological study employing the Planning of Computer-Supported Learning Activities method to guide the organization of development stages, combined with Edgar Morin's pedagogical framework, under the perspective of comprehension, health education, and the context of sexual and gender diversity.
Results: the comic book "LGBT-Phobic Bullying: Shall We Talk?" was developed with the aim of contributing to education and awareness in the fight against LGBT-phobic bullying in school environments, serving as a health educational technology product.
PLoS One
September 2025
Seidenberg School of Computer Science and Information Systems, Pace University, New York, New York, United States of America.
While there has been extensive research on techniques for explainable artificial intelligence (XAI) to enhance AI recommendations, the metacognitive processes in interacting with AI explanations remain underexplored. This study examines how AI explanations impact human decision-making by leveraging cognitive mechanisms that evaluate the accuracy of AI recommendations. We conducted a large-scale experiment (N = 4,302) on Amazon Mechanical Turk (AMT), where participants classified radiology reports as normal or abnormal.
View Article and Find Full Text PDF