Optimizing Data Extraction: Harnessing RAG and LLMs for German Medical Documents.

Yingding Wang , Simon Leutner , Michael Ingrisch , Christoph Klein , Ludwig Christian Hinske , Katharina Danhauser

Stud Health Technol Inform

Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, LMU Munich, Munich, Germany.

Published: August 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

In the field of medical data analysis, converting unstructured text documents into a structured format suitable for further use is a significant challenge. This study introduces an automated local deployed data privacy secure pipeline that uses open-source Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) architecture to convert medical German language documents with sensitive health-related information into a structured format. Testing on a proprietary dataset of 800 unstructured original medical reports demonstrated an accuracy of up to 90% in data extraction of the pipeline compared to data extracted manually by physicians and medical students. This highlights the pipeline's potential as a valuable tool for efficiently extracting relevant data from unstructured sources.

Download full-text PDF	Source
http://dx.doi.org/10.3233/SHTI240567	DOI Listing

Publication Analysis

Top Keywords

data extraction

structured format

medical

data

optimizing data

extraction harnessing

harnessing rag

rag llms

llms german

german medical

A PHP Error was encountered