Summarizing Online Patient Conversations Using Generative Language Models: Experimental and Comparative Study.

Rakhi Asokkumar Subjagouri Nair , Matthias Hartung , Philipp Heinisch , Janik Jaskolski , Cornelius Starke-Knäusel , Susana Veríssimo , David Maria Schmidt , Philipp Cimiano

JMIR Med Inform

Cognitive Interaction Technology Center, Faculty of Technology, Bielefeld University, Bielefeld, Germany.

Published: April 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Background: Social media is acknowledged by regulatory bodies (eg, the Food and Drug Administration) as an important source of patient experience data to learn about patients' unmet needs, priorities, and preferences. However, current methods rely either on manual analysis and do not scale, or on automatic processing, yielding mainly quantitative insights. Methods that can automatically summarize texts and yield qualitative insights at scale are missing.

Objective: The objective of this study was to evaluate to what extent state-of-the-art large language models can appropriately summarize posts shared by patients in web-based forums and health communities. Specifically, the goal was to compare the performance of different language models and prompting strategies on the task of summarizing documents reflecting the experiences of individual patients.

Methods: In our experimental and comparative study, we applied 3 different language models (Flan-T5, Generative Pretrained Transformer [GPT], GPT-3, and GPT-3.5) in combination with various prompting strategies to the task of summarizing posts from patients in online communities. The generated summaries were evaluated with respect to 124 manually created summaries as a ground-truth reference. As evaluation metrics, we used 2 standard metrics from the field of text generation, namely, Recall-Oriented Understudy for Gisting Evaluation (ROUGE) and BERTScore, to compare the automatically generated summaries to the manually created reference summaries.

Results: Among the zero-shot prompting-based large language models investigated, GPT-3.5 performed better than the other models with respect to the ROUGE metrics, as well as with respect to BERTScore. While zero-shot prompting seems to be a good prompting strategy, overall GPT-3.5 in combination with directional stimulus prompting in a 3-shot setting had the best results with respect to the aforementioned metrics. A manual investigation of the summarization of the best-performing method showed that the generated summaries were accurate and plausible compared to the manual summaries.

Conclusions: Taken together, our results suggest that state-of-the-art pretrained language models are a valuable tool to provide qualitative insights about the patient experience to better understand unmet needs, patient priorities, and how a disease impacts daily functioning and quality of life to inform processes aimed at improving health care delivery and ensure that drug development focuses more on the actual priorities and unmet needs of patients. The key limitations of our work are the small data sample as well as the fact that the manual summaries were created by 1 annotator only. Furthermore, the results hold only for the examined models and prompting strategies, potentially not generalizing to other models and strategies.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12038288	PMC
http://dx.doi.org/10.2196/62909	DOI Listing

Publication Analysis

Top Keywords

language models

prompting strategies

generated summaries

models

experimental comparative

comparative study

patient experience

qualitative insights

large language

models prompting

Similar Publications

Leveraging GPT-4o for Automated Extraction and Categorization of CAD-RADS Features From Free-Text Coronary CT Angiography Reports: Diagnostic Study.

JMIR Med Inform

September 2025

Departments of Radiology, The Third Affiliated Hospital, Sun Yat-Sen University, 600 Tianhe Road, Guangzhou, Guangdong, 510630, China, 86 18922109279, 86 20852523108.

Youmei Chen , Mengshi Dong , Jie Sun , Zhanao Meng , Yiqing Yang

Background: Despite the Coronary Artery Reporting and Data System (CAD-RADS) providing a standardized approach, radiologists continue to favor free-text reports. This preference creates significant challenges for data extraction and analysis in longitudinal studies, potentially limiting large-scale research and quality assessment initiatives.

Objective: To evaluate the ability of the generative pre-trained transformer (GPT)-4o model to convert real-world coronary computed tomography angiography (CCTA) free-text reports into structured data and automatically identify CAD-RADS categories and P categories.

View Article and Find Full Text PDF

Similar Publications

Developing an Interprofessional Pediatric Rehabilitation Model of Care in Northern Cree First Nation Communities: Protocol for a Needs Assessment and Codeveloped Intervention With a Qualitative and Participatory Action Approach.

JMIR Res Protoc

September 2025

School of Rehabilitation Science, University of Saskatchewan, Saskatoon, SK, Canada.

Katie Crockett , Hailey Dunn , Rosalie Dostie , Karin Diedrich-Closson , Laureen McIntyre

Background: In Canada, the Indigenous population is the youngest and fastest growing, yet ongoing health disparities for Indigenous peoples are widely recognized. There is a concerning lack of research on childhood disabilities and health conditions in Indigenous populations in Canada. For children with disabilities and chronic health conditions, ongoing access to rehabilitation services, such as occupational therapy, physical therapy, speech-language pathology, and audiology, is critical in promoting positive health and developmental outcomes.

View Article and Find Full Text PDF

Similar Publications

Multicriteria Assessment of Text Quality in Large Language Model-Generated Gynecomastia Materials: DeepSeek Versus OpenAI Versus Claude.

J Craniofac Surg

September 2025

Department of Breast Plastic Surgery, Plastic Surgery Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shijingshan, Beijing, China.

Tianying Zang , Jiaojiao Li , Lisha Wei , Yijin Wang

Background: With the development of artificial intelligence, obtaining patient-centered medical information through large language models (LLMs) is crucial for patient education. However, existing digital resources in online health care have heterogeneous quality, and the reliability and readability of content generated by various AI models need to be evaluated to meet the needs of patients with different levels of cultural literacy.

Objective: This study aims to compare the accuracy and readability of different LLMs in providing medical information related to gynecomastia, and explore the most promising science education tools in practical clinical applications.

View Article and Find Full Text PDF

Similar Publications

Building MHealth technology for Health Promotion in Reproductive Planning: innovations in primary care.

Cien Saude Colet

August 2025

Universidade Federal do Triangulo Mineiro. Uberaba MG Brasil.

Edilson Rodrigues de Lima , Adriana Barbieri Feliciano , Tanyse Galon , Camila Almeida Neves de Oliveira , Heloisa Cristina Figueiredo Frizzo

Reproductive Planning is a recognized basic human right and is essential to guaranteeing the quality of health providers' work processes in Primary Health Care (PHC). This study aimed to build a prototype of a mobile application on reproductive planning to aid the ongoing education of nurses in PHC. This methodological study is based on four stages: modeling, navigation design, abstract interface design, and implementation, and was built based on the thematic categorization of article contents stemming from an integrative review, totaling 24 moblets.

View Article and Find Full Text PDF

Similar Publications

[The biopsychosocial model in disability evaluation: disability is not an ICD code].

Cien Saude Colet

August 2025

Programa de Pós-Graduação em Saúde Coletiva, Universidade de Brasília. Campus Universitário Darcy Ribeiro, Asa Norte. 70910-900 Brasília DF Brasil.

Indyara de Araujo Morais , Marineia Crosara de Resende , Edgar Merchan-Hamann , Everton Luís Pereira

The identification of people with disabilities for social policies is in theoretical, political, and social dispute in Brazil. The aim is to transition from the biomedical model, based on medical reports with a code of the International Classification of Diseases and Related Health Problems (ICD), to the biopsychosocial model with a multi-professional and interdisciplinary evaluation as provided for in the Brazilian Law of Inclusion. This theoretical study attempts to present some support for the discussion on the assessment of disability.

View Article and Find Full Text PDF

Similar Publications