Post-selection inference for high-dimensional mediation analysis with survival outcomes.

Scand Stat Theory Appl

Department of Biostatistics, Columbia University, New York, New York, USA.

Published: June 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

It is of substantial scientific interest to detect mediators that lie in the causal pathway from an exposure to a survival outcome. However, with high-dimensional mediators, as often encountered in modern genomic data settings, there is a lack of powerful methods that can provide valid post-selection inference for the identified marginal mediation effect. To resolve this challenge, we develop a post-selection inference procedure for the maximally selected natural indirect effect using a semiparametric efficient influence function approach. To this end, we establish the asymptotic normality of a stabilized one-step estimator that takes the selection of the mediator into account. Simulation studies show that our proposed method has good empirical performance. We further apply our proposed approach to a lung cancer dataset and find multiple DNA methylation CpG sites that might mediate the effect of cigarette smoking on lung cancer survival.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12369553PMC
http://dx.doi.org/10.1111/sjos.12770DOI Listing

Publication Analysis

Top Keywords

post-selection inference
12
lung cancer
8
inference high-dimensional
4
high-dimensional mediation
4
mediation analysis
4
analysis survival
4
survival outcomes
4
outcomes substantial
4
substantial scientific
4
scientific interest
4

Similar Publications

It is of substantial scientific interest to detect mediators that lie in the causal pathway from an exposure to a survival outcome. However, with high-dimensional mediators, as often encountered in modern genomic data settings, there is a lack of powerful methods that can provide valid post-selection inference for the identified marginal mediation effect. To resolve this challenge, we develop a post-selection inference procedure for the maximally selected natural indirect effect using a semiparametric efficient influence function approach.

View Article and Find Full Text PDF

Post-selection inference for the Cox model with interval-censored data.

Scand Stat Theory Appl

June 2025

Department of Statistics and Probability, Michigan State University, East Lansing, Michigan, USA.

We develop a post-selection inference method for the Cox proportional hazards model with interval-censored data, which provides asymptotically valid p-values and confidence intervals conditional on the model selected by lasso. The method is based on a pivotal quantity that is shown to converge to a uniform distribution under local parameters. Our method involves estimation of the efficient information matrix, for which several approaches are proposed with proof of their consistency.

View Article and Find Full Text PDF

Introduction: The association between air pollution and adverse health outcomes has been extensively documented, with oxidative stress widely considered a contributing factor. However, the precise underlying mechanism(s) remains unclear. Recent studies suggest that environmentally persistent free radicals (EPFRs) may provide the missing connection between air pollution and its detrimental health effects.

View Article and Find Full Text PDF

In a standard analysis, pleiotropic variants are identified by running separate genome-wide association studies (GWAS) and combining results across traits. But such statistical approach based on marginal summary statistics may lead to spurious results. We propose a new statistical approach, Debiased-regularized Factor Analysis Regression Model (DrFARM), through a joint regression model for simultaneous analysis of high-dimensional genetic variants and multilevel dependencies.

View Article and Find Full Text PDF

Simultaneously performing variable selection and inference in high-dimensional regression models is an open challenge in statistics and machine learning. The increasing availability of vast amounts of variables requires the adoption of specific statistical procedures to accurately select the most important predictors in a high-dimensional space, while controlling the false discovery rate (FDR) associated with the variable selection procedure. In this paper, we propose the joint adoption of the Mirror Statistic approach to FDR control, coupled with outcome randomisation to maximise the statistical power of the variable selection procedure, measured through the true positive rate.

View Article and Find Full Text PDF