Evaluating the Role of High-Dimensional Proxy Data in Confounding Adjustment in Multiple Sclerosis Research: A Case Study.

Pharmacoepidemiol Drug Saf

Division of Neurology, Department of Medicine, The Djavad Mowafaghian Centre for Brain Health, University of British Columbia, Vancouver, British Columbia, Canada.

Published: February 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Purpose: Given the historical use of limited confounders in multiple sclerosis (MS) studies utilizing administrative health data, this brief report evaluates the impact of incorporating high-dimensional proxy information on confounder adjustment in MS research. We have implemented high-dimensional propensity score (hdPS) and high-dimensional disease risk score (hdDRS) methods to assess changes in effect estimates for the association between disease-modifying drugs (DMDs) and all-cause mortality in an MS cohort from British Columbia (BC), Canada.

Methods: We conducted a population-based retrospective study using linked administrative databases from BC, including health insurance registries, demographics, physician visits, hospitalizations, prescriptions, and vital statistics. The cohort comprised 19 360 individuals with MS, followed from January 1, 1996, to December 31, 2017. DMD exposure was defined as at least 180 days of use for beta-interferon or glatiramer acetate, or at least 90 days for other DMDs. The outcome was time to all-cause mortality. We compared Cox proportional hazards models adjusting for investigator-specified covariates with those incorporating additional empirical covariates using hdPS and hdDRS methods.

Results: In the unadjusted analysis, DMD exposure was associated with a 69% lower risk of mortality (HR 0.31; 95% CI: 0.27-0.36). Adjusting for investigator-specified covariates, the adjusted hazard ratio (aHR) was 0.76 (95% CI: 0.65-0.89). HdPS analyses showed a 20%-23% lower mortality risk (aHRs: 0.77 to 0.80), while hdDRS analyses indicated a 19%-21% reduction (aHRs: 0.79 to 0.81).

Conclusions: Incorporating high-dimensional proxy information resulted in minor variations in effect estimates compared to traditional covariate adjustment. These findings suggest that the impact of residual confounding in the question under consideration may be modest. Further research should explore additional data dimensions and replicate these findings across different datasets.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11791124PMC
http://dx.doi.org/10.1002/pds.70112DOI Listing

Publication Analysis

Top Keywords

high-dimensional proxy
12
multiple sclerosis
8
incorporating high-dimensional
8
all-cause mortality
8
dmd exposure
8
adjusting investigator-specified
8
investigator-specified covariates
8
high-dimensional
5
evaluating role
4
role high-dimensional
4

Similar Publications

Autonomous systems operating in high-dimensional environments increasingly rely on prioritization heuristics to allocate attention and assess risk, yet these mechanisms can introduce cognitive biases such as salience, spatial framing, and temporal familiarity that influence decision-making without altering the input or accessing internal states. This study presents Priority Inversion via Operational Reasoning (PRIOR), a black-box, non-perturbative diagnostic framework that employs structurally biased but semantically neutral scenario cues to probe inference-level vulnerabilities without modifying pixel-level, statistical, or surface semantic properties. Given the limited accessibility of embodied vision-based systems, we evaluate PRIOR using large language models (LLMs) as abstract reasoning proxies to simulate cognitive prioritization in constrained textual surveillance scenarios inspired by Unmanned Aerial Vehicle (UAV) operations.

View Article and Find Full Text PDF

Detection and evaluation of clusters within sequential data.

Data Min Knowl Discov

August 2025

Department of Mathematics & Computer Science, TU/e, Eindhoven, The Netherlands.

Unlabelled: Sequential data is ubiquitous-it is routinely gathered to gain insights into complex processes such as behavioral, biological, or physical processes. Challengingly, such data not only has dependencies within the observed sequences, but the observations are also often high-dimensional, sparse, and noisy. These are all difficulties that obscure the inner workings of the complex process under study.

View Article and Find Full Text PDF

Unsupervised evaluation of pre-trained DNA language model embeddings.

BMC Genomics

August 2025

Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Wolstein Research Building, Cleveland, OH, USA.

Background: DNA Language Models (DLMs) have generated a lot of hope and hype for solving complex genetics tasks. These models have demonstrated remarkable performance in tasks like gene finding, enhancer annotation, and histone modification. However, they have struggled with tasks such as learning individual-level personal transcriptome variation, highlighting the need for robust evaluation approaches.

View Article and Find Full Text PDF

Background: Glucagon-like Peptide-1 Receptor Agonists (GLP1RA) may reduce asthma exacerbation (AE) risk, but it is unclear which populations benefit most. Recent pharmacoepidemiologic studies have employed iterative causal forest (iCF), a machine learning (ML) algorithm, to identify subgroups with heterogeneous treatment effects (HTEs). While iCF does not rely on prior knowledge of treatment-variable interactions, it may be constrained by missing or poorly defined variables in pharmacoepidemiologic studies.

View Article and Find Full Text PDF

Natural language processing for scalable feature engineering and ultra-high-dimensional confounding adjustment in healthcare database studies.

J Biomed Inform

September 2025

Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA; Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA. Electronic address:

Background: To improve confounding control in healthcare database studies, data-driven algorithms may empirically identify and adjust for large numbers of pre-exposure variables that indirectly capture information on unmeasured confounding factors ('proxy' confounders). Current approaches for high-dimensional proxy adjustment do not leverage free-text notes from electronic health records (EHRs). Unsupervised natural language processing (NLP) technology can scale to generate large numbers of structured features from unstructured notes.

View Article and Find Full Text PDF