Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Large language models (LLMs) can potentially enhance the accessibility and quality of medical information. This study evaluates the reliability and quality of responses generated by ChatGPT-4, an LLM-driven chatbot, compared to those written by physicians, focusing on otorhinolaryngological advice in real-world, text-based workflows. Responses from a public social media forum were anonymized, and ChatGPT-4 generated corresponding replies. A panel of seven board-certified otorhinolaryngologists assessed both sets of responses using six criteria: overall quality, empathy, alignment with medical consensus, information accuracy, inquiry comprehension, and harm potential. Ordinal logistic regression analysis identified factors influencing response quality. ChatGPT-4 responses were preferred in 70.7% of cases and were significantly longer (median: 162 words) than physician responses (median: 67 words; P < .0001). The chatbot's responses received higher ratings across all criteria, with key predictors of this higher quality being greater empathy, stronger alignment with medical consensus, lower potential for harm, and fewer inaccuracies. ChatGPT-4 consistently outperformed physicians in generating responses that adhered to medical consensus, demonstrated accuracy, and conveyed empathy. These findings suggest that integrating AI tools into text-based healthcare consultations could help physicians better address complex, nuanced inquiries and provide high-quality, comprehensive medical advice.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12215459PMC
http://dx.doi.org/10.1038/s41598-025-06769-1DOI Listing

Publication Analysis

Top Keywords

large language
8
responses
6
comparison physician
4
physician large
4
language model
4
model chatbot
4
chatbot responses
4
responses online
4
online ear
4
ear nose
4

Similar Publications

Large language models (LLMs) have been successfully used for data extraction from free-text radiology reports. Most current studies were conducted with LLMs accessed via an application programming interface (API). We evaluated the feasibility of using open-source LLMs, deployed on limited local hardware resources for data extraction from free-text mammography reports, using a common data element (CDE)-based structure.

View Article and Find Full Text PDF

Inflammatory gene expression profile of oral plasmablastic lymphoma.

Virchows Arch

September 2025

Department of Oral Surgery and Pathology, School of Dentistry, Universidade Federal de Minas Gerais, Minas Gerais, Av. Antônio Carlos, Pampulha, Belo Horizonte, 31270-901, Brazil.

Plasmablastic lymphoma (PBL) is a rare and aggressive non-Hodgkin lymphoma with a poor prognosis and short survival rates. It is classified as a large B-cell lymphoma subtype, but carries a plasmacytic immunophenotype. Therefore, PBL has pathogenetic overlaps with diffuse large B-cell lymphoma not otherwise specified (DLBCL NOS) and plasma cell neoplasms (PCNs).

View Article and Find Full Text PDF

Purpose: Degenerative lumbar spinal stenosis (DLSS) represents an increasing challenge due to the aging population. The natural course of untreated DLSS is largely unknown. For the acute DLSS decompensations, the main concern remains the opportunity and timing of surgery, i.

View Article and Find Full Text PDF

Active use of latent tree-structured sentence representation in humans and large language models.

Nat Hum Behav

September 2025

Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China.

Understanding how sentences are represented in the human brain, as well as in large language models (LLMs), poses a substantial challenge for cognitive science. Here we develop a one-shot learning task to investigate whether humans and LLMs encode tree-structured constituents within sentences. Participants (total N = 372, native Chinese or English speakers, and bilingual in Chinese and English) and LLMs (for example, ChatGPT) were asked to infer which words should be deleted from a sentence.

View Article and Find Full Text PDF

GPT-4o, a general-purpose large language model, has a Retrieval-Augmented Variant (GPT-4o-RAG) that can assist in dietary counseling. However, research on its application in this field remains lacking. To bridge this gap, we used the Japanese National Examination for Registered Dietitians as a standardized benchmark for evaluation.

View Article and Find Full Text PDF