Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background And Purpose: Avoiding unnecessary radiotherapy (RT) in patients with limited life expectancy requires accurate selection. Traditional survival models based on structured data often lack precision. Large language models (LLMs) offer a novel approach to structuring unstructured electronic health record (EHR) data, potentially improving survival predictions by integrating comprehensive clinical information.

Materials And Methods: We analyzed structured and unstructured data from 34,276 RT-treated patients at Yonsei Cancer Center. An open-source LLM structured unstructured EHR data using single-shot learning. External validation included 852 patients from Yongin Severance Hospital. We compared the LLM's performance against a domain-specific medical LLM and a smaller variant. Survival prediction models using statistical, machine-learning, and deep-learning approaches incorporated both structured and LLM-structured data.

Results: The open-source LLM structured unstructured EHR data with 87.5 % accuracy, outperforming the domain-specific medical LLM (35.8 %). Larger LLMs were more effective in structuring clinically relevant features, such as general condition and disease extent, which correlated with survival. Incorporating LLM-structured features improved the deep learning model's C-index from 0.737 to 0.820 (internal validation) and from 0.779 to 0.842 (external validation). Risk stratification was also enhanced, with clearer differentiation among low-, intermediate-, and high-risk groups (p < 0.001). Additionally, models became more interpretable, as key LLM-structured features aligned with statistically significant predictors traditionally identified from structured data.

Conclusion: General-domain LLMs, despite not being fine-tuned for medical data, can effectively structure large-scale unstructured EHRs, significantly improving survival prediction accuracy and model interpretability. The RT-Surv framework highlights the potential of LLMs to enhance clinical decision-making and optimize RT treatment.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.radonc.2025.111052DOI Listing

Publication Analysis

Top Keywords

ehr data
12
structured unstructured
12
large language
8
unstructured electronic
8
electronic health
8
open-source llm
8
llm structured
8
unstructured ehr
8
external validation
8
domain-specific medical
8

Similar Publications

Identifying levels of alcohol use disorder severity in electronic health records.

Subst Abuse Treat Prev Policy

September 2025

Centre for Interdisciplinary Addiction Research (ZIS), Department of Psychiatry and Psychotherapy, University Medical Center Hamburg-Eppendorf (UKE), Martinistraße 52, 20246, Hamburg, Germany.

Background: Alcohol use disorder (AUD) is conceptualized as a dimensional phenomenon in the DSM-5, but electronic health records (EHRs) rely on binary AUD definitions according to the ICD-10. The present study classifies AUD severity levels using EHR data and tests whether increasing AUD severity levels are linked with increased comorbidity.

Methods: Billing data from two German statutory health insurance companies in Hamburg included n = 21,954 adults diagnosed with alcohol-specific conditions between 2017 and 2021.

View Article and Find Full Text PDF

Background: Electronic health records (EHRs) have been linked to information overload, which can lead to cognitive fatigue, a precursor to burnout. This can cause health care providers to miss critical information and make clinical errors, leading to delays in care delivery. This challenge is particularly pronounced in medical intensive care units (ICUs), where patients are critically ill and their EHRs contain extensive and complex data.

View Article and Find Full Text PDF

Capturing the dynamic changes in patients' internal states as they approach death due to fatal diseases remains a major challenge in understanding individual pathologies and improving end-of-life care. However, existing methods primarily focus on specific test values or organ dysfunction markers, failing to provide a comprehensive view of the evolving internal state preceding death. To address this, we analyzed electronic health record (EHR) data from a single institution, including 8,976 cancer patients and 77 laboratory parameters, by constructing continuous mortality prediction models based on gradient-boosting decision trees and leveraging them for temporal analyses.

View Article and Find Full Text PDF

There is a lack of longitudinal data on type 2 diabetes (T2D) in low- and middle-income countries. We leveraged the electronic health records (EHR) system of a publicly funded academic institution to establish a retrospective cohort with longitudinal data to facilitate benchmarking, surveillance, and resource planning of a multi-ethnic T2D population in Malaysia. This cohort included 15,702 adults aged ≥ 18 years with T2D who received outpatient care (January 2002-December 2020) from Universiti Malaya Medical Centre (UMMC), Kuala Lumpur, Malaysia.

View Article and Find Full Text PDF

The Burden of Cancer and Precancerous Conditions Among Transgender Individuals in a Large Health Care Network: Retrospective Cohort Study.

JMIR Cancer

September 2025

Department of Health Outcomes and Biomedical Informatics, University of Florida, 1889 Museum Road, Suite 7000, Gainesville, FL, 32611, United States, 1 352 294-5969.

Background: Disparities in cancer burden between transgender and cisgender individuals remain an underexplored area of research.

Objective: This study aimed to examine the cumulative incidence and associated risk factors for cancer and precancerous conditions among transgender individuals compared with matched cisgender individuals.

Methods: We conducted a retrospective cohort study using patient-level electronic health record (EHR) data from the University of Florida Health Integrated Data Repository between 2012 and 2023.

View Article and Find Full Text PDF