Developing deep learning-based strategies to predict the risk of hepatocellular carcinoma among patients with nonalcoholic fatty liver disease from electronic health records.

Zhao Li , Lan Lan , Yujia Zhou , Ruoxing Li , Kenneth D Chavin , Hua Xu , Liang Li , David J H Shih , W Jim Zheng

medRxiv

McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, 7000 Fannin Street, Suite 600, Houston, Texas, 77030.

Published: November 2023

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Background: Deep learning models showed great success and potential when applied to many biomedical problems. However, the accuracy of deep learning models for many disease prediction problems is affected by time-varying covariates, rare incidence, and covariate imbalance when using structured electronic health records data. The situation is further exasperated when predicting the risk of one disease on condition of another disease, such as the hepatocellular carcinoma risk among patients with nonalcoholic fatty liver disease due to slow, chronic progression, the scarce of data with both disease conditions and the sex bias of the diseases.

Objective: The goal of this study is to investigate the extent to which time-varying covariates, rare incidence, and covariate imbalance influence deep learning performance, and then devised strategies to tackle these challenges. These strategies were applied to improve hepatocellular carcinoma risk prediction among patients with nonalcoholic fatty liver disease.

Methods: We evaluated two representative deep learning models in the task of predicting the occurrence of hepatocellular carcinoma in a cohort of patients with nonalcoholic fatty liver disease (n = 220,838) from a national EHR database. The disease prediction task was carefully formulated as a classification problem while taking censorship and the length of follow-up into consideration.

Results: We developed a novel backward masking scheme to evaluate how the length of longitudinal information after the index date affects disease prediction. We observed that modeling time-varying covariates improved the performance of the algorithms and transfer learning mitigated reduced performance caused by the lack of data. In addition, covariate imbalance, such as sex bias in data impaired performance. Deep learning models trained on one sex and evaluated in the other sex showed reduced performance, indicating the importance of assessing covariate imbalance while preparing data for model training.

Conclusions: Devising proper strategies to address challenges from time-varying covariates, lack of data, and covariate imbalance can be key to counteracting data bias and accurately predicting disease occurrence using deep learning models. The novel strategies developed in this work can significantly improve the performance of hepatocellular carcinoma risk prediction among patients with nonalcoholic fatty liver disease. Furthermore, our novel strategies can be generalized to apply to other disease risk predictions using structured electronic health records, especially for disease risks on condition of another disease.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10680899	PMC
http://dx.doi.org/10.1101/2023.11.17.23298691	DOI Listing

Publication Analysis

Top Keywords

deep learning

hepatocellular carcinoma

patients nonalcoholic

nonalcoholic fatty

fatty liver

learning models

covariate imbalance

liver disease

time-varying covariates

disease

Similar Publications

The Role of Deep Cerebral Tracts in Predicting Postoperative Aphasia: An nTMS-Based Investigation of the Corticothalamic Fibers.

Hum Brain Mapp

September 2025

Department of Neurosurgery, Heidelberg University Hospital, Heidelberg, Germany.

Zixu Bao , Haosu Zhang , Maximilian Schwendner , Axel Schröder , Bernhard Meyer

Postoperative aphasia (POA) is a common complication in patients undergoing surgery for language-eloquent lesions. This study aimed to enhance the prediction of POA by leveraging preoperative navigated transcranial magnetic stimulation (nTMS) language mapping and diffusion tensor imaging (DTI)-based tractography, incorporating deep learning (DL) algorithms. One hundred patients with left-hemispheric lesions were retrospectively enrolled (43 developed postoperative aphasia, as the POA group; 57 did not, as the non-aphasia (NA) group).

View Article and Find Full Text PDF

Similar Publications

MOLECULE: Molecular-dynamics and Optimized deep Learning for Entropy-regularized Classification and Uncertainty-aware Ligand Evaluation.

J Chem Theory Comput

September 2025

Dipartimento di Chimica, Università di Pavia, Via Taramelli 12, Pavia 27100, Italy.

Ivan Cucchi , Elena Frasnetti , Francesco Frigerio , Fabrizio Cinquini , Silvia Pavoni

Machine learning (ML) and deep learning (DL) methodologies have significantly advanced drug discovery and design in several aspects. Additionally, the integration of structure-based data has proven to successfully support and improve the models' predictions. Indeed, we previously demonstrated that combining molecular dynamics (MD)-derived descriptors with ML models allows to effectively classify kinase ligands as allosteric or orthosteric.

View Article and Find Full Text PDF

Similar Publications

Leveraging Deep Learning to Address Diagnostic Challenges with Insufficient Image Data.

ACS Sens

September 2025

Institute of Applied Mechanics, National Taiwan University, Taipei 106, Taiwan.

Jian-Ming Lu , Ping-Yeh Chiu , Chien-Fu Chen

In recent AI-driven disease diagnosis, the success of models has depended mainly on extensive data sets and advanced algorithms. However, creating traditional data sets for rare or emerging diseases presents significant challenges. To address this issue, this study introduces a direct-self-attention Wasserstein generative adversarial network (DSAWGAN) designed to improve diagnostic capabilities in infectious diseases with limited data availability.

View Article and Find Full Text PDF

Similar Publications

Few-shot learning for highly accelerated 3D time-of-flight MRA reconstruction.

Magn Reson Med

September 2025

Centre for Integrative Neuroimaging, FMRIB Division, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, UK.

Hao Li , Mark Chiew , Iulius Dragonu , Peter Jezzard , Thomas W Okell

Purpose: To develop a deep learning-based reconstruction method for highly accelerated 3D time-of-flight MRA (TOF-MRA) that achieves high-quality reconstruction with robust generalization using extremely limited acquired raw data, addressing the challenge of time-consuming acquisition of high-resolution, whole-head angiograms.

Methods: A novel few-shot learning-based reconstruction framework is proposed, featuring a 3D variational network specifically designed for 3D TOF-MRA that is pre-trained on simulated complex-valued, multi-coil raw k-space datasets synthesized from diverse open-source magnitude images and fine-tuned using only two single-slab experimentally acquired datasets. The proposed approach was evaluated against existing methods on acquired retrospectively undersampled in vivo k-space data from five healthy volunteers and on prospectively undersampled data from two additional subjects.

View Article and Find Full Text PDF

Similar Publications

Automatic infant 2D pose estimation from videos: Comparing seven deep neural network methods.

Behav Res Methods

September 2025

Czech Technical University in Prague, Faculty of Electrical Engineering, Department of Cybernetics, Prague, Czech Republic.

Filipe Gama , Matěj Mísař , Lukáš Navara , Sergiu T Popescu , Matej Hoffmann

Automatic markerless estimation of infant posture and motion from ordinary videos carries great potential for movement studies "in the wild", facilitating understanding of motor development and massively increasing the chances of early diagnosis of disorders. There has been a rapid development of human pose estimation methods in computer vision, thanks to advances in deep learning and machine learning. However, these methods are trained on datasets that feature adults in different contexts.

View Article and Find Full Text PDF

Similar Publications