Medication information extraction using local large language models.

Phillip Richter-Pechanski , Marvin Seiferling , Christina Kiriakou , Dominic M Schwab , Nicolas A Geis , Christoph Dieterich , Anette Frank

J Biomed Inform

Department of Computational Linguistics, Heidelberg University, Im Neuenheimer Feld 325, 69120 Heidelberg, DE, Germany.

Published: September 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Objective: Medication information is crucial for clinical routine and research. However, a vast amount is stored in unstructured text, such as doctor's letters, requiring manual extraction - a resource-intensive, error-prone task. Automating this process comes with significant constraints in a clinical setup, including the demand for clinical expertise, limited time-resources, restricted IT infrastructure, and the demand for transparent predictions. Recent advances in generative large language models (LLMs) and parameter-efficient fine-tuning methods show potential to address these challenges.

Methods: We evaluated local LLMs for end-to-end extraction of medication information, combining named entity recognition and relation extraction. We used format-restricting instructions and developed an innovative feedback pipeline to facilitate automated evaluation. We applied token-level Shapley values to visualize and quantify token contributions, to improve transparency of model predictions.

Results: Two open-source LLMs - one general (Llama) and one domain-specific (OpenBioLLM) - were evaluated on the English n2c2 2018 corpus and the German CARDIO:DE corpus. OpenBioLLM frequently struggled with structured outputs and hallucinations. Fine-tuned Llama models achieved new state-of-the-art results, improving F1-score by up to 10 percentage points (pp.) for adverse drug events and 6 pp. for medication reasons on English data. On the German dataset, Llama established a new benchmark, outperforming traditional machine learning methods by up to 16 pp. micro average F1-score.

Conclusion: Our findings show that fine-tuned local open-source generative LLMs outperform SOTA methods for medication information extraction, delivering high performance with limited time and IT resources in a real-world clinical setup, and demonstrate their effectiveness on both English and German data. Applying Shapley values improved prediction transparency, supporting informed clinical decision-making.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.jbi.2025.104898	DOI Listing

Publication Analysis

Top Keywords

medication extraction

large language

language models

clinical setup

shapley values

medication

clinical

extraction local

local large

models objective

A PHP Error was encountered