Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset.

Chuizheng Meng , Loc Trinh , Nan Xu , James Enouen , Yan Liu

Sci Rep

Department of Computer Science, University of Southern California, Los Angeles, CA, 90089, USA.

Published: May 2022

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

The recent release of large-scale healthcare datasets has greatly propelled the research of data-driven deep learning models for healthcare applications. However, due to the nature of such deep black-boxed models, concerns about interpretability, fairness, and biases in healthcare scenarios where human lives are at stake call for a careful and thorough examination of both datasets and models. In this work, we focus on MIMIC-IV (Medical Information Mart for Intensive Care, version IV), the largest publicly available healthcare dataset, and conduct comprehensive analyses of interpretability as well as dataset representation bias and prediction fairness of deep learning models for in-hospital mortality prediction. First, we analyze the interpretability of deep learning mortality prediction models and observe that (1) the best-performing interpretability method successfully identifies critical features for mortality prediction on various prediction models as well as recognizing new important features that domain knowledge does not consider; (2) prediction models rely on demographic features, raising concerns in fairness. Therefore, we then evaluate the fairness of models and do observe the unfairness: (1) there exists disparate treatment in prescribing mechanical ventilation among patient groups across ethnicity, gender and age; (2) models often rely on racial attributes unequally across subgroups to generate their predictions. We further draw concrete connections between interpretability methods and fairness metrics by showing how feature importance from interpretability methods can be beneficial in quantifying potential disparities in mortality predictors. Our analysis demonstrates that the prediction performance is not the only factor to consider when evaluating models for healthcare applications, since high prediction performance might be the result of unfair utilization of demographic features. Our findings suggest that future research in AI models for healthcare applications can benefit from utilizing the analysis workflow of interpretability and fairness as well as verifying if models achieve superior performance at the cost of introducing bias.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9065125	PMC
http://dx.doi.org/10.1038/s41598-022-11012-2	DOI Listing

Publication Analysis

Top Keywords

deep learning

models

interpretability fairness

learning models

models healthcare

healthcare applications

mortality prediction

prediction models

interpretability

prediction

Similar Publications

Oral bioavailability property prediction based on task similarity transfer learning.

Mol Divers

September 2025

Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, 211198, China.

Chen Zeng , Chengcheng Xu , Yingxu Liu , Yunya Jiang , Lidan Zheng

Drug absorption significantly influences pharmacokinetics. Accurately predicting human oral bioavailability (HOB) is essential for optimizing drug candidates and improving clinical success rates. The traditional method based on experiment is a common way to obtain HOB, but the experimental method is time-consuming and costly.

View Article and Find Full Text PDF

Similar Publications

Decoding binocular color differences via EEG signals: linking ERP dynamics to chromatic disparity in CIELAB space.

Exp Brain Res

September 2025

School of Information Science and Technology, Yunnan Normal University, Kunming, 650500, China.

Famiao Mou , Zhineng Lv , Xuesong Jin , Jijun Pan , Lijun Yun

This study explores how differences in colors presented separately to each eye (binocular color differences) can be identified through EEG signals, a method of recording electrical activity from the brain. Four distinct levels of green-red color differences, defined in the CIELAB color space with constant luminance and chroma, are investigated in this study. Analysis of Event-Related Potentials (ERPs) revealed a significant decrease in the amplitude of the P300 component as binocular color differences increased, suggesting a measurable brain response to these differences.

View Article and Find Full Text PDF

Similar Publications

Clinical evaluation of motion robust reconstruction using deep learning in lung CT.

Phys Eng Sci Med

September 2025

Department of Radiology, Otaru General Hospital, Otaru, Hokkaido, Japan.

Shiho Kuwajima , Daisuke Oura

In lung CT imaging, motion artifacts caused by cardiac motion and respiration are common. Recently, CLEAR Motion, a deep learning-based reconstruction method that applies motion correction technology, has been developed. This study aims to quantitatively evaluate the clinical usefulness of CLEAR Motion.

View Article and Find Full Text PDF

Similar Publications

Predicting complex time series with deep echo state networks.

Chaos

September 2025

School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA.

Afrouz Delshad , Elizabeth M Cherry

Although many real-world time series are complex, developing methods that can learn from their behavior effectively enough to enable reliable forecasting remains challenging. Recently, several machine-learning approaches have shown promise in addressing this problem. In particular, the echo state network (ESN) architecture, a type of recurrent neural network where neurons are randomly connected and only the read-out layer is trained, has been proposed as suitable for many-step-ahead forecasting tasks.

View Article and Find Full Text PDF

Similar Publications

An Explainable Deep Learning Model for Focal Liver Lesion Diagnosis Using Multiparametric MRI.

Radiol Artif Intell

September 2025

Department of Radiology, Shanghai Jiao Tong University Medical School Affiliated Ruijin Hospital, No. 197 Ruijin Er Road, Shanghai 200025, China.

Zhehan Shen , Lingzhi Chen , Lilong Wang , Shunjie Dong , Fakai Wang

Purpose To assess the effectiveness of an explainable deep learning (DL) model, developed using multiparametric MRI (mpMRI) features, in improving diagnostic accuracy and efficiency of radiologists for classification of focal liver lesions (FLLs). Materials and Methods FLLs ≥ 1 cm in diameter at mpMRI were included in the study. nn-Unet and Liver Imaging Feature Transformer (LIFT) models were developed using retrospective data from one hospital (January 2018-August 2023).

View Article and Find Full Text PDF

Similar Publications