Publications by David Dunson | LitMetric

Publications by authors named "David Dunson"

Page 1 of 10

Inferring Covariance Structure from Multiple Data Sources via Subspace Factor Analysis.

Noirrit Kiran Chandra , David B Dunson , Jason Xu

J Am Stat Assoc

June 2025

Factor analysis provides a canonical framework for imposing lower-dimensional structure such as sparse covariance in high-dimensional data. High-dimensional data on the same set of variables are often collected under different conditions, for instance in reproducing studies across research groups. In such cases, it is natural to seek to learn the shared versus condition-specific structure.

View Article and Find Full Text PDF

INFERRING SYNERGISTIC AND ANTAGONISTIC INTERACTIONS IN MIXTURES OF EXPOSURES.

Shounak Chattopadhyay , Stephanie M Engel , David Dunson

Ann Appl Stat

March 2025

There is abundant interest in assessing the joint effects of multiple exposures on human health. This is often referred to as the mixtures problem in environmental epidemiology and toxicology. Classically, studies have examined the adverse health effects of different chemicals one at a time, but there is concern that certain chemicals may act together to amplify each other's effects.

View Article and Find Full Text PDF

Product Centred Dirichlet Processes for Bayesian Multiview Clustering.

Alexander Dombowsky , David B Dunson

J R Stat Soc Series B Stat Methodol

April 2025

While there is an immense literature on Bayesian methods for clustering, the multiview case has received little attention. This problem focuses on obtaining distinct but statistically dependent clusterings in a common set of entities for different data types. For example, clustering patients into subgroups with subgroup membership varying according to the domain of the patient variables.

View Article and Find Full Text PDF

Bag of DAGs: Inferring Directional Dependence in Spatiotemporal Processes.

Bora Jin , Michele Peruzzi , David Dunson

Bayesian Anal

November 2024

We propose a class of nonstationary processes to characterize space- and time-varying directional associations in point-referenced data. We are motivated by spatiotemporal modeling of air pollutants in which local wind patterns are key determinants of the pollutant spread, but information regarding prevailing wind directions may be missing or unreliable. We propose to map a discrete set of wind directions to edges in a sparse directed acyclic graph (DAG), accounting for uncertainty in directional correlation patterns across a domain.

View Article and Find Full Text PDF

Motion-invariant variational autoencoding of brain structural connectomes.

Yizi Zhang , Meimei Liu , Zhengwu Zhang , David Dunson

Imaging Neurosci (Camb)

October 2024

Mapping of human brain structural connectomes via diffusion magnetic resonance imaging (dMRI) offers a unique opportunity to understand brain structural connectivity and relate it to various human traits, such as cognition. However, head displacement during image acquisition can compromise the accuracy of connectome reconstructions and subsequent inference results. We develop a generative model to learn low-dimensional representations of structural connectomes invariant to motion-induced artifacts, so that we can link brain networks and human traits more accurately, and generate motion-adjusted connectomes.

View Article and Find Full Text PDF

Workflow for Statistical Analysis of Environmental Mixtures.

Bonnie R Joubert , Glenn Palmer , David Dunson , Marianthi-Anna Kioumourtzoglou , Brent A Coull

Environ Health Perspect

June 2025

Background: Human exposure to complex, changing, and variably correlated mixtures of environmental chemicals has presented analytical challenges to epidemiologists and human health researchers. There has been a wide variety of recent advances in statistical methods for analyzing mixtures data, with most methods having open-source software for implementation. However, there is no one-size-fits-all method for analyzing mixtures data given the considerable heterogeneity in scientific focus and study design.

View Article and Find Full Text PDF

Nonparametric IPSS: fast, flexible feature selection with false discovery control.

Omar Melikechi , David B Dunson , Jeffrey W Miller

Bioinformatics

May 2025

Motivation: Feature selection is a critical task in machine learning and statistics. However, existing feature selection methods either (i) rely on parametric methods such as linear or generalized linear models, (ii) lack theoretical false discovery control, or (iii) identify few true positives.

Results: We introduce a general feature selection method with finite-sample false discovery control based on applying integrated path stability selection (IPSS) to arbitrary feature importance scores.

View Article and Find Full Text PDF

LOW-RANK LONGITUDINAL FACTOR REGRESSION WITH APPLICATION TO CHEMICAL MIXTURES.

Glenn Palmer , Amy H Herring , David B Dunson

Ann Appl Stat

March 2025

Developmental epidemiology commonly focuses on assessing the association between multiple early life exposures and childhood health. Statistical analyses of data from such studies focus on inferring the contributions of individual exposures, while also characterizing time-varying and interacting effects. Such inferences are made more challenging by correlations among exposures, nonlinearity, and the curse of dimensionality.

View Article and Find Full Text PDF

Radial Neighbors for Provably Accurate Scalable Approximations of Gaussian Processes.

Yichen Zhu , Michele Peruzzi , Cheng Li , David B Dunson

Biometrika

December 2024

In geostatistical problems with massive sample size, Gaussian processes can be approximated using sparse directed acyclic graphs to achieve scalable computational complexity. In these models, data at each location are typically assumed conditionally dependent on a small set of parents which usually include a subset of the nearest neighbors. These methodologies often exhibit excellent empirical performance, but the lack of theoretical validation leads to unclear guidance in specifying the underlying graphical model and sensitivity to graph choice.

View Article and Find Full Text PDF

Environmental Mixtures Analysis (E-MIX) Workflow and Methods Repository.

Bonnie R Joubert , Glenn Palmer , David Dunson , Marianthi-Anna Kioumourtzoglou , Brent A Coull

medRxiv

December 2024

Human exposure to complex, changing, and variably correlated mixtures of environmental chemicals has presented analytical challenges to epidemiologists and human health researchers. There have been a wide variety of recent advances in statistical methods for analyzing mixtures data, with most of these methods having open-source software for implementation. However, there is no one-size-fits-all method for analyzing mixtures data given the considerable heterogeneity in scientific focus and study design.

View Article and Find Full Text PDF

Bayesian semiparametric inference in longitudinal metabolomics data.

Abhra Sarkar , Ornella Cominetti , Ivan Montoliu , Joanne Hosking , Jonathan Pinkney , David B Dunson

Sci Rep

December 2024

The article is motivated by an application to the EarlyBird cohort study aiming to explore how anthropometrics and clinical and metabolic processes are associated with obesity and glucose control during childhood. There is interest in inferring the relationship between dynamically changing and high-dimensional metabolites and a longitudinal response. Important aspects of the analysis include the selection of the important set of metabolites and the accommodation of missing data in both response and covariate values.

View Article and Find Full Text PDF

Brain network fingerprints of Alzheimer's disease risk factors in mouse models with humanized APOE alleles.

Steven Winter , Ali Mahzarnia , Robert J Anderson , Zay Yar Han , Jessica Tremblay , David B Dunson

Magn Reson Imaging

December 2024

Alzheimer's disease (AD) presents complex challenges due to its multifactorial nature, poorly understood etiology, and late detection. The mechanisms through which genetic and modifiable risk factors influence disease susceptibility are under intense investigation, with APOE being the major genetic risk factor for late onset AD. Yet the impact of unique risk factors on brain networks is difficult to disentangle, and their interactions remain unclear.

View Article and Find Full Text PDF

APOE, Immune Factors, Sex, and Diet Interact to Shape Brain Networks in Mouse Models of Aging.

Steven Winter , Ali Mahzarnia , Robert J Anderson , Zay Yar Han , Jessica Tremblay , David B Dunson

bioRxiv

July 2024

Unlabelled: Alzheimer's disease (AD) presents complex challenges due to its multifactorial nature, poorly understood etiology, and late detection. The mechanisms through which genetic, fixed and modifiable risk factors influence susceptibility to AD are under intense investigation, yet the impact of unique risk factors on brain networks is difficult to disentangle, and their interactions remain unclear. To model multiple risk factors including APOE genotype, age, sex, diet, and immunity we leveraged mice expressing the human APOE and NOS2 genes, conferring a reduced immune response compared to mouse Nos2.

View Article and Find Full Text PDF

SPATIAL PREDICTIONS ON PHYSICALLY CONSTRAINED DOMAINS: APPLICATIONS TO ARCTIC SEA SALINITY DATA.

Bora Jin , Amy H Herring , David Dunson

Ann Appl Stat

June 2024

In this paper we predict sea surface salinity (SSS) in the Arctic Ocean based on satellite measurements. SSS is a crucial indicator for ongoing changes in the Arctic Ocean and can offer important insights about climate change. We particularly focus on areas of water mistakenly flagged as ice by satellite algorithms.

View Article and Find Full Text PDF

Spatial meshing for general Bayesian multivariate models.

Michele Peruzzi , David B Dunson

J Mach Learn Res

March 2024

Quantifying spatial and/or temporal associations in multivariate geolocated data of different types is achievable via spatial random effects in a Bayesian hierarchical model, but severe computational bottlenecks arise when spatial dependence is encoded as a latent Gaussian process (GP) in the increasingly common large scale data settings on which we focus. The scenario worsens in non-Gaussian models because the reduced analytical tractability leads to additional hurdles to computational efficiency. In this article, we introduce Bayesian models of spatially referenced data in which the likelihood or the latent process (or both) are not Gaussian.

View Article and Find Full Text PDF

Detecting changes in the transmission rate of a stochastic epidemic model.

Jenny Huang , Raphaël Morsomme , David Dunson , Jason Xu

Stat Med

May 2024

Throughout the course of an epidemic, the rate at which disease spreads varies with behavioral changes, the emergence of new disease variants, and the introduction of mitigation policies. Estimating such changes in transmission rates can help us better model and predict the dynamics of an epidemic, and provide insight into the efficacy of control and intervention strategies. We present a method for likelihood-based estimation of parameters in the stochastic susceptible-infected-removed model under a time-inhomogeneous transmission rate comprised of piecewise constant components.

View Article and Find Full Text PDF

Ellipsoid fitting with the Cayley transform.

Omar Melikechi , David B Dunson

IEEE Trans Signal Process

November 2023

We introduce Cayley transform ellipsoid fitting (CTEF), an algorithm that uses the Cayley transform to fit ellipsoids to noisy data in any dimension. Unlike many ellipsoid fitting methods, CTEF is ellipsoid specific, meaning it always returns elliptic solutions, and can fit arbitrary ellipsoids. It also significantly outperforms other fitting methods when data are not uniformly distributed over the surface of an ellipsoid.

View Article and Find Full Text PDF

Bayesian inference on high-dimensional multivariate binary responses.

Antik Chakraborty , Rihui Ou , David B Dunson

J Am Stat Assoc

November 2023

It has become increasingly common to collect high-dimensional binary response data; for example, with the emergence of new sampling techniques in ecology. In smaller dimensions, multivariate probit (MVP) models are routinely used for inferences. However, algorithms for fitting such models face issues in scaling up to high dimensions due to the intractability of the likelihood, involving an integral over a multivariate normal distribution having no analytic form.

View Article and Find Full Text PDF

Tree representations of brain structural connectivity via persistent homology.

Didong Li , Phuc Nguyen , Zhengwu Zhang , David Dunson

Front Neurosci

October 2023

The brain structural connectome is generated by a collection of white matter fiber bundles constructed from diffusion weighted MRI (dMRI), acting as highways for neural activity. There has been abundant interest in studying how the structural connectome varies across individuals in relation to their traits, ranging from age and gender to neuropsychiatric outcomes. After applying tractography to dMRI to get white matter fiber bundles, a key question is how to represent the brain connectome to facilitate statistical analyses relating connectomes to traits.

View Article and Find Full Text PDF

Estimating a brain network predictive of stress and genotype with supervised autoencoders.

Austin Talbot , David Dunson , Kafui Dzirasa , David Carlson

J R Stat Soc Ser C Appl Stat

August 2023

Targeted brain stimulation has the potential to treat mental illnesses. We develop an approach to help design protocols by identifying relevant multi-region electrical dynamics. Our approach models these dynamics as a superposition of latent networks, where the latent variables predict a relevant outcome.

View Article and Find Full Text PDF

A generalized Bayes framework for probabilistic clustering.

Tommaso Rigon , Amy H Herring , David B Dunson

Biometrika

September 2023

Loss-based clustering methods, such as k-means clustering and its variants, are standard tools for finding groups in data. However, the lack of quantification of uncertainty in the estimated clusters is a disadvantage. Model-based clustering based on mixture models provides an alternative approach, but such methods face computational problems and are highly sensitive to the choice of kernel.

View Article and Find Full Text PDF

Explaining transmission rate variations and forecasting epidemic spread in multiple regions with a semiparametric mixed effects SIR model.

David A Buch , James E Johndrow , David B Dunson

Biometrics

December 2023

The transmission rate is a central parameter in mathematical models of infectious disease. Its pivotal role in outbreak dynamics makes estimating the current transmission rate and uncovering its dependence on relevant covariates a core challenge in epidemiological research as well as public health policy evaluation. Here, we develop a method for flexibly inferring a time-varying transmission rate parameter, modeled as a function of covariates and a smooth Gaussian process (GP).

View Article and Find Full Text PDF

PPA: Principal parcellation analysis for brain connectomes and multiple traits.

Rongjie Liu , Meng Li , David B Dunson

Neuroimage

August 2023

Our understanding of the structure of the brain and its relationships with human traits is largely determined by how we represent the structural connectome. Standard practice divides the brain into regions of interest (ROIs) and represents the connectome as an adjacency matrix having cells measuring connectivity between pairs of ROIs. Statistical analyses are then heavily driven by the (largely arbitrary) choice of ROIs.

View Article and Find Full Text PDF

Bayesian matrix completion for hypothesis testing.

Bora Jin , David B Dunson , Julia E Rager , David M Reif , Stephanie M Engel

J R Stat Soc Ser C Appl Stat

May 2023

We aim to infer bioactivity of each chemical by assay endpoint combination, addressing sparsity of toxicology data. We propose a Bayesian hierarchical framework which borrows information across different chemicals and assay endpoints, facilitates out-of-sample prediction of activity for chemicals not yet assayed, quantifies uncertainty of predicted activity, and adjusts for multiplicity in hypothesis testing. Furthermore, this paper makes a novel attempt in toxicology to simultaneously model heteroscedastic errors and a nonparametric mean function, leading to a broader definition of activity whose need has been suggested by toxicologists.

View Article and Find Full Text PDF

Mutual information: Measuring nonlinear dependence in longitudinal epidemiological data.

Alexander L Young , Willem van den Boom , Rebecca A Schroeder , Vijay Krishnamoorthy , Karthik Raghunathan , David B Dunson

PLoS One

April 2023

Given a large clinical database of longitudinal patient information including many covariates, it is computationally prohibitive to consider all types of interdependence between patient variables of interest. This challenge motivates the use of mutual information (MI), a statistical summary of data interdependence with appealing properties that make it a suitable alternative or addition to correlation for identifying relationships in data. MI: (i) captures all types of dependence, both linear and nonlinear, (ii) is zero only when random variables are independent, (iii) serves as a measure of relationship strength (similar to but more general than R2), and (iv) is interpreted the same way for numerical and categorical data.

View Article and Find Full Text PDF