Performance and improvement strategies for adapting generative large language models for electronic health record applications: A systematic review.

Xinsong Du , Zhengyang Zhou , Yifei Wang , Ya-Wen Chuang , Yiming Li , Richard Yang , Pengyu Hong , David W Bates , Li Zhou

Int J Med Inform

Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA 02115, United States; Department of Medicine, Harvard Medical School, Boston, MA 02115, United States.

Published: August 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Purpose: To synthesize performance and improvement strategies for adapting generative LLMs in EHR analyses and applications.

Methods: We followed the PRISMA guidelines to conduct a systematic review of articles from PubMed and Web of Science published between January 1, 2023 and November 9, 2024. Multiple reviewers including biomedical informaticians and a clinician involved in the article reviewing process. Studies were included if they used generative LLMs to analyze real-world EHR data and reported quantitative performance evaluations for an improvement technique. The review identified key clinical applications, summarized performance and the improvement strategies.

Results: Of the 18,735 articles retrieved, 196 met our criteria. 112 (57.1%) studies used generative LLMs for clinical decision support tasks, 40 (20.4%) studies involved documentation tasks, 39 (19.9%) studies involved information extraction tasks, 11 (5.6%) studies involved patient communication tasks, and 10 (5.1%) studies included summarization tasks. Among the 196 studies, most studies (88.8%) did not quantitatively evaluate the LLM performance improvement strategies, with the rest twenty-four studies (12.2%) quantitatively evaluated the effectiveness of in-context learning (9 studies), fine-tuning (12 studies), multimodal integration (8 studies), and ensemble learning (2 studies). Three studies highlighted that few-shot prompting, fine-tuning, and multimodal data integration might not improve performance, and another two studies found that fine-tuning a smaller model could outperform a large model.

Conclusion: Applying a performance improvement strategy may not necessarily lead to performance improvement, and detailed guidelines regarding how to apply those strategies more effectively and safely are needed, which can be completed from more quantitative analysis in the future.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12413914	PMC
http://dx.doi.org/10.1016/j.ijmedinf.2025.106091	DOI Listing

Publication Analysis

Top Keywords

performance improvement

studies

improvement strategies

generative llms

studies involved

performance

strategies adapting

adapting generative

systematic review

studies included

A PHP Error was encountered