Retrieval augmented generation for large language models in healthcare: A systematic review.

Lameck Mbangula Amugongo , Pietro Mascheroni , Steven Brooks , Stefan Doering , Jan Seidel

PLOS Digit Health

Biostatistics and Data Sciences Department, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riß, Germany.

Published: June 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Large Language Models (LLMs) have demonstrated promising capabilities to solve complex tasks in critical sectors such as healthcare. However, LLMs are limited by their training data which is often outdated, the tendency to generate inaccurate ("hallucinated") content and a lack of transparency in the content they generate. To address these limitations, retrieval augmented generation (RAG) grounds the responses of LLMs by exposing them to external knowledge sources. However, in the healthcare domain there is currently a lack of systematic understanding of which datasets, RAG methodologies and evaluation frameworks are available. This review aims to bridge this gap by assessing RAG-based approaches employed by LLMs in healthcare, focusing on the different steps of retrieval, augmentation and generation. Additionally, we identify the limitations, strengths and gaps in the existing literature. Our synthesis shows that 78.9% of studies used English datasets and 21.1% of the datasets are in Chinese. We find that a range of techniques are employed RAG-based LLMs in healthcare, including Naive RAG, Advanced RAG, and Modular RAG. Surprisingly, proprietary models such as GPT-3.5/4 are the most used for RAG applications in healthcare. We find that there is a lack of standardised evaluation frameworks for RAG-based applications. In addition, the majority of the studies do not assess or address ethical considerations related to RAG in healthcare. It is important to account for ethical challenges that are inherent when AI systems are implemented in the clinical setting. Lastly, we highlight the need for further research and development to ensure responsible and effective adoption of RAG in the medical domain.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12157099	PMC
http://dx.doi.org/10.1371/journal.pdig.0000877	DOI Listing

Publication Analysis

Top Keywords

retrieval augmented

augmented generation

large language

language models

rag

evaluation frameworks

llms healthcare

healthcare

llms

generation large

Similar Publications

Effectiveness, Usability, and Acceptability of ChatGPT With Retrieval-Augmented Generation (SIV-ChatGPT) in Increasing Seasonal Influenza Vaccination Uptake Among Older Adults: Quasi-Experimental Study.

J Med Internet Res

September 2025

School of Governance and Policy Science, The Chinese University of Hong Kong, Hong Kong, China (Hong Kong).

Zixin Wang , Tsz Hin Tsang , Fuk-Yuen Yu , Yuan Fang , Siyu Chen

Background: Older adults are more vulnerable to severe consequences caused by seasonal influenza. Although seasonal influenza vaccination (SIV) is effective and free vaccines are available, the SIV uptake rate remained inadequate among people aged 65 years or older in Hong Kong, China. There was a lack of studies evaluating ChatGPT in promoting vaccination uptake among older adults.

View Article and Find Full Text PDF

Similar Publications

Enhancing Oncology-Specific Question Answering With Large Language Models Through Fine-Tuned Embeddings With Synthetic Data.

JCO Clin Cancer Inform

September 2025

Department of Applied AI and Data Science, City of Hope, Duarte, CA.

Kun-Han Lu , Sina Mehdinia , Kingson Man , Chi Wah Wong , Allen Mao

Purpose: The recent advancements of retrieval-augmented generation (RAG) and large language models (LLMs) have revolutionized the extraction of real-world evidence from unstructured electronic health records (EHRs) in oncology. This study aims to enhance RAG's effectiveness by implementing a retriever encoder specifically designed for oncology EHRs, with the goal of improving the precision and relevance of retrieved clinical notes for oncology-related queries.

Methods: Our model was pretrained with more than six million oncology notes from 209,135 patients at City of Hope.

View Article and Find Full Text PDF

Similar Publications

Retrieval augmented generation based dynamic prompting for few-shot biomedical named entity recognition using large language models.

Res Sq

August 2025

Yao Ge , Sudeshna Das , Yuting Guo , Abeed Sarker

Biomedical named entity recognition (NER) is a high-utility natural language processing (NLP) task, and large language models (LLMs) show promise particularly in few-shot settings (i.e., limited training data).

View Article and Find Full Text PDF

Similar Publications

Evaluating large language models in neuro-oncology: A comparative study of accuracy, completeness, and clinical usefulness.

J Clin Neurosci

September 2025

Nordwest-Krankenhaus Sanderbusch, Friesland Kliniken gGmbH, Department of Neurosurgery, Sande, Germany. Electronic address:

Shefqet Hajdari , Minaam Farooq , Aleeza Habib , Asad Ali Siddiqui , Laiba Sarfraz

Background: Large language models (LLMs), with their remarkable ability to retrieve and analyse the information within seconds, are generating significant interest in the domain of healthcare. This study aims to assess and compare the accuracy, completeness, and usefulness of the responses of Gemini Advanced, ChatGPT-3.5, and ChatGPT-4, in neuro-oncology cases.

View Article and Find Full Text PDF

Similar Publications

From data silos to insights: the PRINCE multi-agent knowledge engine for preclinical drug development.

Front Artif Intell

August 2025

Bayer Research and Development, Pharmaceuticals, Preclinical Development, Berlin, Germany.

Carlos Henrique Vieira-Vieira , Sarang Sanjay Kulkarni , Adam Zalewski , Jobst Löffler , Jonas Münch

The pharmaceutical industry faces pressure to improve the drug development process while reducing costs in an evolving regulatory landscape. This paper presents the Preclinical Information Center (PRINCE), a cloud-hosted data integration platform developed by Bayer AG in collaboration with Thoughtworks. PRINCE integrates decades of structured and unstructured safety study reports, leveraging a multi-agent architecture based on Large Language Models (LLMs) and advanced data retrieval methodologies, such as Retrieval-Augmented Generation and Text-to-SQL.

View Article and Find Full Text PDF

Similar Publications