Improving mortality prediction after radiotherapy with large language model structuring of large-scale unstructured electronic health records.

Sangjoon Park , Chan Woo Wee , Seo Hee Choi , Kyung Hwan Kim , Jee Suk Chang , Hong In Yoon , Ik Jae Lee , Yong Bae Kim , Jaeho Cho , Ki Chang Keum , Chang Geol Lee , Hwa Kyung Byun , Woong Sub Koom

Radiother Oncol

Department of Radiation Oncology, Yonsei Cancer Center, Yonsei University College of Medicine, Seoul, Republic of Korea. Electronic address:

Published: July 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Background And Purpose: Avoiding unnecessary radiotherapy (RT) in patients with limited life expectancy requires accurate selection. Traditional survival models based on structured data often lack precision. Large language models (LLMs) offer a novel approach to structuring unstructured electronic health record (EHR) data, potentially improving survival predictions by integrating comprehensive clinical information.

Materials And Methods: We analyzed structured and unstructured data from 34,276 RT-treated patients at Yonsei Cancer Center. An open-source LLM structured unstructured EHR data using single-shot learning. External validation included 852 patients from Yongin Severance Hospital. We compared the LLM's performance against a domain-specific medical LLM and a smaller variant. Survival prediction models using statistical, machine-learning, and deep-learning approaches incorporated both structured and LLM-structured data.

Results: The open-source LLM structured unstructured EHR data with 87.5 % accuracy, outperforming the domain-specific medical LLM (35.8 %). Larger LLMs were more effective in structuring clinically relevant features, such as general condition and disease extent, which correlated with survival. Incorporating LLM-structured features improved the deep learning model's C-index from 0.737 to 0.820 (internal validation) and from 0.779 to 0.842 (external validation). Risk stratification was also enhanced, with clearer differentiation among low-, intermediate-, and high-risk groups (p < 0.001). Additionally, models became more interpretable, as key LLM-structured features aligned with statistically significant predictors traditionally identified from structured data.

Conclusion: General-domain LLMs, despite not being fine-tuned for medical data, can effectively structure large-scale unstructured EHRs, significantly improving survival prediction accuracy and model interpretability. The RT-Surv framework highlights the potential of LLMs to enhance clinical decision-making and optimize RT treatment.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.radonc.2025.111052	DOI Listing

Publication Analysis

Top Keywords

ehr data

structured unstructured

large language

unstructured electronic

electronic health

open-source llm

llm structured

unstructured ehr

external validation

domain-specific medical

Similar Publications

Identifying levels of alcohol use disorder severity in electronic health records.

Subst Abuse Treat Prev Policy

September 2025

Centre for Interdisciplinary Addiction Research (ZIS), Department of Psychiatry and Psychotherapy, University Medical Center Hamburg-Eppendorf (UKE), Martinistraße 52, 20246, Hamburg, Germany.

Jakob Manthey , Carolin Kilian , Ludwig Kraus , Ingo Schäfer , Anna Schranz

Background: Alcohol use disorder (AUD) is conceptualized as a dimensional phenomenon in the DSM-5, but electronic health records (EHRs) rely on binary AUD definitions according to the ICD-10. The present study classifies AUD severity levels using EHR data and tests whether increasing AUD severity levels are linked with increased comorbidity.

Methods: Billing data from two German statutory health insurance companies in Hamburg included n = 21,954 adults diagnosed with alcohol-specific conditions between 2017 and 2021.

View Article and Find Full Text PDF

Similar Publications

Investigating Information Visualization to Combat Information Overload in Electronic Health Records: Protocol for a Randomized Controlled Trial.

JMIR Res Protoc

September 2025

School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.

Saif Khairat , Jennifer Morelli , Marcella H Boynton , Thomas Bice , Jeffrey A Gold

Background: Electronic health records (EHRs) have been linked to information overload, which can lead to cognitive fatigue, a precursor to burnout. This can cause health care providers to miss critical information and make clinical errors, leading to delays in care delivery. This challenge is particularly pronounced in medical intensive care units (ICUs), where patients are critically ill and their EHRs contain extensive and complex data.

View Article and Find Full Text PDF

Similar Publications

Predicting mortality dynamics in cancer patients: A machine learning approach to pre-death events.

PLoS One

September 2025

Department of Biomedical Data Intelligence, Graduate School of Medicine, Kyoto University, Kyoto, Japan.

Tatsuki Yamamoto , Minoru Sakuragi , Yuzuha Tuji , Yuji Okamoto , Eiichiro Uchino

Capturing the dynamic changes in patients' internal states as they approach death due to fatal diseases remains a major challenge in understanding individual pathologies and improving end-of-life care. However, existing methods primarily focus on specific test values or organ dysfunction markers, failing to provide a comprehensive view of the evolving internal state preceding death. To address this, we analyzed electronic health record (EHR) data from a single institution, including 8,976 cancer patients and 77 laboratory parameters, by constructing continuous mortality prediction models based on gradient-boosting decision trees and leveraging them for temporal analyses.

View Article and Find Full Text PDF

Similar Publications

Cohort profile: The Multiethnic Lifestyle, Obesity and Diabetes Registry in Malaysia (MeLODY) retrospective cohort in a middle-income country in Southeast Asia.

PLoS One

September 2025

Department of Medicine, Faculty of Medicine, Universiti Malaya, Kuala Lumpur, Malaysia.

Sarkaaj Singh , Anis Syazwani Abd Raof , Jian-Wen Samuel Lee-Boey , Hana Salwani Mohd Zaini , Ying Guat Ooi

There is a lack of longitudinal data on type 2 diabetes (T2D) in low- and middle-income countries. We leveraged the electronic health records (EHR) system of a publicly funded academic institution to establish a retrospective cohort with longitudinal data to facilitate benchmarking, surveillance, and resource planning of a multi-ethnic T2D population in Malaysia. This cohort included 15,702 adults aged ≥ 18 years with T2D who received outpatient care (January 2002-December 2020) from Universiti Malaya Medical Centre (UMMC), Kuala Lumpur, Malaysia.

View Article and Find Full Text PDF

Similar Publications

The Burden of Cancer and Precancerous Conditions Among Transgender Individuals in a Large Health Care Network: Retrospective Cohort Study.

JMIR Cancer

September 2025

Department of Health Outcomes and Biomedical Informatics, University of Florida, 1889 Museum Road, Suite 7000, Gainesville, FL, 32611, United States, 1 352 294-5969.

Shuang Yang , Yongqiu Li , Christopher W Wheldon , Jessica Y Islam , Mattia Prosperi

Background: Disparities in cancer burden between transgender and cisgender individuals remain an underexplored area of research.

Objective: This study aimed to examine the cumulative incidence and associated risk factors for cancer and precancerous conditions among transgender individuals compared with matched cisgender individuals.

Methods: We conducted a retrospective cohort study using patient-level electronic health record (EHR) data from the University of Florida Health Integrated Data Repository between 2012 and 2023.

View Article and Find Full Text PDF

Similar Publications