Evaluating the Role of High-Dimensional Proxy Data in Confounding Adjustment in Multiple Sclerosis Research: A Case Study.

Mohammad Ehsanul Karim , Md Belal Hossain , Huah Shin Ng , Feng Zhu , Hanna A Frank , Helen Tremlett

Pharmacoepidemiol Drug Saf

Division of Neurology, Department of Medicine, The Djavad Mowafaghian Centre for Brain Health, University of British Columbia, Vancouver, British Columbia, Canada.

Published: February 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Purpose: Given the historical use of limited confounders in multiple sclerosis (MS) studies utilizing administrative health data, this brief report evaluates the impact of incorporating high-dimensional proxy information on confounder adjustment in MS research. We have implemented high-dimensional propensity score (hdPS) and high-dimensional disease risk score (hdDRS) methods to assess changes in effect estimates for the association between disease-modifying drugs (DMDs) and all-cause mortality in an MS cohort from British Columbia (BC), Canada.

Methods: We conducted a population-based retrospective study using linked administrative databases from BC, including health insurance registries, demographics, physician visits, hospitalizations, prescriptions, and vital statistics. The cohort comprised 19 360 individuals with MS, followed from January 1, 1996, to December 31, 2017. DMD exposure was defined as at least 180 days of use for beta-interferon or glatiramer acetate, or at least 90 days for other DMDs. The outcome was time to all-cause mortality. We compared Cox proportional hazards models adjusting for investigator-specified covariates with those incorporating additional empirical covariates using hdPS and hdDRS methods.

Results: In the unadjusted analysis, DMD exposure was associated with a 69% lower risk of mortality (HR 0.31; 95% CI: 0.27-0.36). Adjusting for investigator-specified covariates, the adjusted hazard ratio (aHR) was 0.76 (95% CI: 0.65-0.89). HdPS analyses showed a 20%-23% lower mortality risk (aHRs: 0.77 to 0.80), while hdDRS analyses indicated a 19%-21% reduction (aHRs: 0.79 to 0.81).

Conclusions: Incorporating high-dimensional proxy information resulted in minor variations in effect estimates compared to traditional covariate adjustment. These findings suggest that the impact of residual confounding in the question under consideration may be modest. Further research should explore additional data dimensions and replicate these findings across different datasets.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11791124	PMC
http://dx.doi.org/10.1002/pds.70112	DOI Listing

Publication Analysis

Top Keywords

high-dimensional proxy

multiple sclerosis

incorporating high-dimensional

all-cause mortality

dmd exposure

adjusting investigator-specified

investigator-specified covariates

high-dimensional

evaluating role

role high-dimensional

Similar Publications

Weaponizing cognitive bias in autonomous systems: a framework for black-box inference attacks.

Front Artif Intell

August 2025

Aviation Industry Development Research Center of China, Beijing, China.

Shiyong Chu , Yuwei Chen

Autonomous systems operating in high-dimensional environments increasingly rely on prioritization heuristics to allocate attention and assess risk, yet these mechanisms can introduce cognitive biases such as salience, spatial framing, and temporal familiarity that influence decision-making without altering the input or accessing internal states. This study presents Priority Inversion via Operational Reasoning (PRIOR), a black-box, non-perturbative diagnostic framework that employs structurally biased but semantically neutral scenario cues to probe inference-level vulnerabilities without modifying pixel-level, statistical, or surface semantic properties. Given the limited accessibility of embodied vision-based systems, we evaluate PRIOR using large language models (LLMs) as abstract reasoning proxies to simulate cognitive prioritization in constrained textual surveillance scenarios inspired by Unmanned Aerial Vehicle (UAV) operations.

View Article and Find Full Text PDF

Similar Publications

Detection and evaluation of clusters within sequential data.

Data Min Knowl Discov

August 2025

Department of Mathematics & Computer Science, TU/e, Eindhoven, The Netherlands.

Alexander Van Werde , Albert Senen-Cerda , Gianluca Kosmella , Jaron Sanders

Unlabelled: Sequential data is ubiquitous-it is routinely gathered to gain insights into complex processes such as behavioral, biological, or physical processes. Challengingly, such data not only has dependencies within the observed sequences, but the observations are also often high-dimensional, sparse, and noisy. These are all difficulties that obscure the inner workings of the complex process under study.

View Article and Find Full Text PDF

Similar Publications

Unsupervised evaluation of pre-trained DNA language model embeddings.

BMC Genomics

August 2025

Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Wolstein Research Building, Cleveland, OH, USA.

Raghav Awasthi , Gayan Samuditha Mend Mend Arachchige , Xiaofeng Zhu

Background: DNA Language Models (DLMs) have generated a lot of hope and hype for solving complex genetics tasks. These models have demonstrated remarkable performance in tasks like gene finding, enhancer annotation, and histone modification. However, they have struggled with tasks such as learning individual-level personal transcriptome variation, highlighting the need for robust evaluation approaches.

View Article and Find Full Text PDF

Similar Publications

Glucagon-like Peptide-1 Receptor Agonists in Asthma Exacerbations: An Application of High-Dimensional Iterative Causal Forest to Identify Subgroups.

Pharmacoepidemiol Drug Saf

August 2025

Department of Epidemiology, University of North Carolina Gillings School of Global Public Health, Chapel Hill, North Carolina, USA.

Tiansheng Wang , Jeanny H Wang , Alan C Kinlaw , Richard Wyss , Virginia Pate

Background: Glucagon-like Peptide-1 Receptor Agonists (GLP1RA) may reduce asthma exacerbation (AE) risk, but it is unclear which populations benefit most. Recent pharmacoepidemiologic studies have employed iterative causal forest (iCF), a machine learning (ML) algorithm, to identify subgroups with heterogeneous treatment effects (HTEs). While iCF does not rely on prior knowledge of treatment-variable interactions, it may be constrained by missing or poorly defined variables in pharmacoepidemiologic studies.

View Article and Find Full Text PDF

Similar Publications

Natural language processing for scalable feature engineering and ultra-high-dimensional confounding adjustment in healthcare database studies.

J Biomed Inform

September 2025

Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA; Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA. Electronic address:

Richard Wyss , Jie Yang , Sebastian Schneeweiss , Joseph M Plasek , Li Zhou

Background: To improve confounding control in healthcare database studies, data-driven algorithms may empirically identify and adjust for large numbers of pre-exposure variables that indirectly capture information on unmeasured confounding factors ('proxy' confounders). Current approaches for high-dimensional proxy adjustment do not leverage free-text notes from electronic health records (EHRs). Unsupervised natural language processing (NLP) technology can scale to generate large numbers of structured features from unstructured notes.

View Article and Find Full Text PDF

Similar Publications