Quality and efficiency of integrating customised large language model-generated summaries versus physician-written summaries: a validation study.

Rosanne C Schoonbeek , Jessica D Workum , Stephanie C E Schuit , Anne H Hoekman , Tarannom Mehri , Job N Doornberg , Tom P van der Laan , Charlotte M H H T Bootsma-Robroeks ,

BMJ Open

Department of Medical Information Technology, University Medical Centre Groningen, Groningen, The Netherlands.

Published: September 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Objectives: To compare the quality and time efficiency of physician-written summaries with customised large language model (LLM)-generated medical summaries integrated into the electronic health record (EHR) in a non-English clinical environment.

Design: Cross-sectional non-inferiority validation study.

Setting: Tertiary academic hospital.

Participants: 52 physicians from 8 specialties at a large Dutch academic hospital participated, either in writing summaries (n=42) or evaluating them (n=10).

Interventions: Physician writers wrote summaries of 50 patient records. LLM-generated summaries were created for the same records using an EHR-integrated LLM. An independent, blinded panel of physician evaluators compared physician-written summaries to LLM-generated summaries.

Primary And Secondary Outcome Measures: Primary outcome measures were completeness, correctness and conciseness (on a 5-point Likert scale). Secondary outcomes were preference and trust, and time to generate either the physician-written or LLM-generated summary.

Results: The completeness and correctness of LLM-generated summaries did not differ significantly from physician-written summaries. However, LLM summaries were less concise (3.0 vs 3.5, p=0.001). Overall evaluation scores were similar (3.4 vs 3.3, p=0.373), with 57% of evaluators preferring LLM-generated summaries. Trust in both summary types was comparable, and interobserver variability showed excellent reliability (intraclass correlation coefficient 0.975). Physicians took an average of 7 min per summary, while LLMs completed the same task in just 15.7 s.

Conclusions: LLM-generated summaries are comparable to physician-written summaries in completeness and correctness, although slightly less concise. With a clear time-saving benefit, LLMs could help reduce clinicians' administrative burden without compromising summary quality.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12414186	PMC
http://dx.doi.org/10.1136/bmjopen-2025-099301	DOI Listing

Publication Analysis

Top Keywords

physician-written summaries

llm-generated summaries

summaries

completeness correctness

customised large

large language

outcome measures

llm-generated

physician-written

quality efficiency

Similar Publications

Quality and efficiency of integrating customised large language model-generated summaries versus physician-written summaries: a validation study.

BMJ Open

September 2025

Department of Medical Information Technology, University Medical Centre Groningen, Groningen, The Netherlands.

Rosanne C Schoonbeek , Jessica D Workum , Stephanie C E Schuit , Anne H Hoekman , Tarannom Mehri

View Article and Find Full Text PDF

Similar Publications

Automated generation of discharge summaries: leveraging large language models with clinical data.

Sci Rep

May 2025

Institute of Medical Informatics, Heidelberg University, Heidelberg, Germany.

Matthias Ganzinger , Nicola Kunz , Pascal Fuchs , Cornelia K Lyu , Martin Loos

This study explores the use of open-source large language models (LLMs) to automate generation of German discharge summaries from structured clinical data. The structured data used to produce AI-generated summaries were manually extracted from electronic health records (EHRs) by a trained medical professional. By leveraging structured documentation collected for research and quality management, the goal is to assist physicians with editable draft summaries.

View Article and Find Full Text PDF

Similar Publications

Comparative study of Claude 3.5-Sonnet and human physicians in generating discharge summaries for patients with renal insufficiency: assessment of efficiency, accuracy, and quality.

Front Digit Health

December 2024

Department of Medical Education, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China.

Haijiao Jin , Jinglu Guo , Qisheng Lin , Shaun Wu , Weiguo Hu

Background: The rapid development of artificial intelligence (AI) has shown great potential in medical document generation. This study aims to evaluate the performance of Claude 3.5-Sonnet, an advanced AI model, in generating discharge summaries for patients with renal insufficiency, compared to human physicians.

View Article and Find Full Text PDF

Similar Publications

Developing and Evaluating Large Language Model-Generated Emergency Medicine Handoff Notes.

JAMA Netw Open

December 2024

Department of Emergency Medicine, NewYork-Presbyterian/Weill Cornell Medicine, New York.

Vince Hartman , Xinyuan Zhang , Ritika Poddar , Matthew McCarty , Alexander Fortenko

Importance: An emergency medicine (EM) handoff note generated by a large language model (LLM) has the potential to reduce physician documentation burden without compromising the safety of EM-to-inpatient (IP) handoffs.

Objective: To develop LLM-generated EM-to-IP handoff notes and evaluate their accuracy and safety compared with physician-written notes.

Design, Setting, And Participants: This cohort study used EM patient medical records with acute hospital admissions that occurred in 2023 at NewYork-Presbyterian/Weill Cornell Medical Center.

View Article and Find Full Text PDF

Similar Publications

Diagnostic and Management Applications of ChatGPT in Structured Otolaryngology Clinical Scenarios.

OTO Open

August 2023

Department of Otolaryngology-Head and Neck Surgery Loma Linda University Medical Center Loma Linda California USA.

Roy W Qu , Uneeb Qureshi , Garrett Petersen , Steve C Lee

Objective: To evaluate the clinical applications and limitations of chat generative pretrained transformer (ChatGPT) in otolaryngology.

Study Design: Cross-sectional survey.

Setting: Tertiary academic center.

View Article and Find Full Text PDF

Similar Publications