98%
921
2 minutes
20
BackgroundThis study compares three large language models (LLMs) in answering common HIV questions, given ongoing concerns about their accuracy and reliability in patient education.MethodsModels answered 63 HIV questions. Accuracy (5-point Likert), readability (Flesch-Kincaid, Gunning Fog, Coleman-Liau), and reliability (DISCERN, EQIP) were assessed.ResultsClaude 3.7 Sonnet showed significantly higher accuracy (4.54 ± 0.44) compared to ChatGPT-4o (4.29 ± 0.49) and Gemini Advanced 2.0 Flash (4.31 ± 0.50) ( < .001). ChatGPT-4o had lower accuracy in disease definition, follow-up, and transmission routes, while Gemini Advanced 2.0 Flash performed poorly in daily life and treatment-related questions. Readability analyses indicated ChatGPT-4o produced the most accessible content according to Flesch-Kincaid and Coleman-Liau indices, whereas Claude 3.7 Sonnet was most comprehensible by Gunning Fog standards. Gemini Advanced 2.0 Flash consistently generated more complex texts across all readability measures ( < .001). Regarding reliability, Claude 3.7 Sonnet achieved "good" quality on DISCERN, while others were rated "moderate" ( = .059). On EQIP, Claude 3.7 Sonnet (median 61.8) and ChatGPT-4o (55.3) were classified as "good quality with minor limitations," whereas Gemini Advanced 2.0 Flash (41.2) was rated "low quality" ( = .049).ConclusionsClaude 3.7 Sonnet is preferable for accuracy and reliability, while ChatGPT-4o offers superior readability. Selecting LLMs for HIV education should consider accuracy, readability, and reliability, emphasizing regular assessment of content quality and cultural sensitivity.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1177/09564624251372369 | DOI Listing |
J Med Internet Res
September 2025
Artificial Intelligence Center, China Medical University Hospital, 2, Yude Road, Taichung, 404327, Taiwan, 886 4-22052121.
Background: The effective implementation of personalized pharmacogenomics (PGx) requires the integration of released clinical guidelines into decision support systems to facilitate clinical applications. Large language models (LLMs) can be valuable tools for automating information extraction and updates.
Objective: This study aimed to assess the effectiveness of repeated cross-comparisons and an agreement-threshold strategy in 2 advanced LLMs as supportive tools for updating information.
J Clin Neurosci
September 2025
Nordwest-Krankenhaus Sanderbusch, Friesland Kliniken gGmbH, Department of Neurosurgery, Sande, Germany. Electronic address:
Background: Large language models (LLMs), with their remarkable ability to retrieve and analyse the information within seconds, are generating significant interest in the domain of healthcare. This study aims to assess and compare the accuracy, completeness, and usefulness of the responses of Gemini Advanced, ChatGPT-3.5, and ChatGPT-4, in neuro-oncology cases.
View Article and Find Full Text PDFInt J STD AIDS
September 2025
Infectious Diseases and Clinical Microbiology, Izmir Democracy University, Izmir, Turkey.
BackgroundThis study compares three large language models (LLMs) in answering common HIV questions, given ongoing concerns about their accuracy and reliability in patient education.MethodsModels answered 63 HIV questions. Accuracy (5-point Likert), readability (Flesch-Kincaid, Gunning Fog, Coleman-Liau), and reliability (DISCERN, EQIP) were assessed.
View Article and Find Full Text PDFFront Oncol
August 2025
Department of Ophthalmology, The First Affiliated Hospital of Chongqing Medical University, Chongqing Key Laboratory for the Prevention and Treatment of Major Blinding Eye Diseases, Chongqing, China.
Background: Uveal melanoma is the most common primary intraocular malignancy in adults, yet radiotherapy decision-making for this disease often remains complex and variable. Although emerging generative AI models have shown promise in synthesizing vast clinical information, few studies have systematically compared their performance against experienced radiation oncologists in this specialized domain. This study examined the comparative accuracy of three leading generative AI models and experienced radiation oncologists in guideline-based clinical decision-making for uveal melanoma.
View Article and Find Full Text PDFJ Multidiscip Healthc
August 2025
Department of Tuberculosis, The Fourth People's Hospital of Nanning, Nanning, Guangxi, People's Republic of China.
Purpose: This study endeavors to conduct a comprehensive assessment on the performance of large language models (LLMs) in health consultation for individuals living with HIV, delve into their applicability across a diverse array of dimensions, and provide evidence-based support for clinical deployment.
Patients And Methods: A 23-question multi-dimensional HIV-specific question bank was developed, covering fundamental knowledge, diagnosis, treatment, prognosis, and case analysis. Four advanced LLMs-ChatGPT-4o, Copilot, Gemini, and Claude-were tested using a multi-dimensional evaluation system assessing medical accuracy, comprehensiveness, understandability, reliability, and humanistic care (which encompasses elements such as individual needs attention, emotional support, and ethical considerations).