Pretrained patient trajectories for adverse drug event prediction using common data model-based electronic health records.

Junmo Kim , Joo Seong Kim , Ji-Hyang Lee , Min-Gyu Kim , Taehyun Kim , Chaeeun Cho , Rae Woong Park , Kwangsoo Kim

Commun Med (Lond)

Department of Transdisciplinary Medicine, Institute of Convergence Medicine with Innovative Technology, Seoul National University Hospital, Seoul, Republic of Korea.

Published: June 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Background: Pretraining electronic health record (EHR) data using language models has enhanced performance across various medical tasks. Despite the potential of EHR pretraining models, predicting adverse drug events (ADEs) using EHR pretraining models has not been explored.

Methods: We used observational medical outcomes partnership common data model (CDM)-based EHR data from Seoul National University Hospital (SNUH) between January 2001 and December 2023 and Ajou University Medical Center (AUMC) between January 2004 and December 2023. In total 510,879 and 419,505 adult inpatients from SNUH and AUMC are included in internal and external datasets. For pretraining, the model was trained to infer randomly masked tokens using preceding and following history. In this process, we introduced domain embedding (DE) to provide information about the domain of masked tokens, preventing the model from finding codes from irrelevant domains. For qualitative analysis, we identified important features using the attention matrix from each finetuned model.

Results: Here we show that EHR pretraining models with DE outperform the models without pretraining and DE in predicting various ADEs, with the average area under the receiver operating characteristic curve (AUROC) of 0.958 and 0.964 in internal and external validations, respectively. For feature importance analysis, we demonstrate that the results are consistent with priorly reported background clinical knowledge. In addition to cohort-level interpretation, patient-level interpretation is also available.

Conclusions: The CDM-based EHR pretraining model with DE can improve prediction performance for various ADEs and can provide proper explanation at cohort and patient level. Our model has the potential to serve as a foundation model due to its strong prediction performance, interpretability, and compatibility.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12166071	PMC
http://dx.doi.org/10.1038/s43856-025-00914-7	DOI Listing

Publication Analysis

Top Keywords

ehr pretraining

pretraining models

adverse drug

common data

electronic health

ehr data

cdm-based ehr

december 2023

internal external

pretraining model

Similar Publications

TARGET-AI: a foundational approach for the targeted deployment of artificial intelligence electrocardiography in the electronic health record.

medRxiv

August 2025

Evangelos K Oikonomou , Bruno Batinica , Lovedeep Singh Dhingra , Arya Aminorroaya , Andeas Coppi

Background: Artificial intelligence (AI) applied to routine electrocardiograms (ECGs) offers promise for screening of structural heart disease (SHD), yet broad clinical integration remains limited by high false positive rates and the lack of tailored deployment strategies.

Methods: We developed TARGET-AI, a multimodal AI-enabled pipeline that integrates longitudinal electronic health record (EHR) data with ECG images to identify optimal intersections of healthcare encounters and patient phenotypes for targeted AI-ECG screening. The approach is built on (1) a foundation model pretrained on 118 million coded EHR events from 159,322 individuals to generate temporal patient embeddings and identify high-risk screening candidates, followed by (2) a contrastive vision-language model trained on 754,533 ECG-echocardiogram pairs to detect SHD with tunable performance characteristics.

View Article and Find Full Text PDF

Similar Publications

Multi-Modal Fusion of Routine Care Electronic Health Records (EHR): A Scoping Review.

Information (Basel)

January 2025

Indiana University School of Medicine, Indiana University, 340 W 10th St, Indianapolis, IN 46202, USA.

Zina Ben-Miled , Jacob A Shebesh , Jing Su , Paul R Dexter , Randall W Grout

Background: Electronic health records (EHR) are now widely available in healthcare institutions to document the medical history of patients as they interact with healthcare services. In particular, routine care EHR data are collected for a large number of patients. These data span multiple heterogeneous elements (i.

View Article and Find Full Text PDF

Similar Publications

Multimodal Deep Learning for ARDS Detection.

medRxiv

August 2025

Department of Mathematics University of California Davis Davis, CA, USA.

Stefan Broecker , Jason Y Adams , Girish Kumar , Rachael A Callcut , Yuan Ni

Objective: Poor outcomes in acute respiratory distress syndrome (ARDS) can be alleviated with tools that support early diagnosis. Current machine learning methods for detecting ARDS do not take full advantage of the multimodality of ARDS pathophysiology. We developed a multimodal deep learning model that uses imaging data, continuously collected ventilation data, and tabular data derived from a patient's electronic health record (EHR) to make ARDS predictions.

View Article and Find Full Text PDF

Similar Publications

Benchmarking of pre-training strategies for electronic health record foundation models.

JAMIA Open

August 2025

Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA 94305, United States.

Samson Mataraso , Shreya D'Souza , David Seong , Eloïse Berson , Camilo Espinosa

Objective: Our objective is to compare different pre-training strategies for electronic health record (EHR) foundation models.

Materials And Methods: We evaluated three approaches using a transformer-based architecture: baseline (no pre-training), self-supervised pre-training with masked language modeling, and supervised pre-training. The models were assessed on their ability to predict both major adverse cardiac events and mortality occurring within 12 months.

View Article and Find Full Text PDF

Similar Publications

Improving Clinical Foundation Models with Multi-modal Learning and Domain Adaptation for Chronic Disease Prediction.

IEEE J Biomed Health Inform

August 2025

Wenhui Hou , Jianqiang Wang , Qika Lin , Xiaokang Wang , Ling Huang

Modelling patient trajectories from longitudinal electronic health records (EHRs) is crucial for early chronic disease prediction. Foundation models (FMs), benefiting from the computational power and generalization abilities, offer a promising direction towards understanding patient health progression. However, key challenges of adopting FMs in clinical decisions remain in (1) incorporating multi-modal EHR data into an FM effectively for unified patient representations and (2) ensuring model generalizability across various clinical domains with distribution shifts.

View Article and Find Full Text PDF

Similar Publications