Publications by David B Dunson | LitMetric

Publications by authors named "David B Dunson"

Page 1 of 8

Inferring Covariance Structure from Multiple Data Sources via Subspace Factor Analysis.

Noirrit Kiran Chandra , David B Dunson , Jason Xu

J Am Stat Assoc

June 2025

Factor analysis provides a canonical framework for imposing lower-dimensional structure such as sparse covariance in high-dimensional data. High-dimensional data on the same set of variables are often collected under different conditions, for instance in reproducing studies across research groups. In such cases, it is natural to seek to learn the shared versus condition-specific structure.

View Article and Find Full Text PDF

Product Centred Dirichlet Processes for Bayesian Multiview Clustering.

Alexander Dombowsky , David B Dunson

J R Stat Soc Series B Stat Methodol

April 2025

While there is an immense literature on Bayesian methods for clustering, the multiview case has received little attention. This problem focuses on obtaining distinct but statistically dependent clusterings in a common set of entities for different data types. For example, clustering patients into subgroups with subgroup membership varying according to the domain of the patient variables.

View Article and Find Full Text PDF

Nonparametric IPSS: fast, flexible feature selection with false discovery control.

Omar Melikechi , David B Dunson , Jeffrey W Miller

Bioinformatics

May 2025

Motivation: Feature selection is a critical task in machine learning and statistics. However, existing feature selection methods either (i) rely on parametric methods such as linear or generalized linear models, (ii) lack theoretical false discovery control, or (iii) identify few true positives.

Results: We introduce a general feature selection method with finite-sample false discovery control based on applying integrated path stability selection (IPSS) to arbitrary feature importance scores.

View Article and Find Full Text PDF

LOW-RANK LONGITUDINAL FACTOR REGRESSION WITH APPLICATION TO CHEMICAL MIXTURES.

Glenn Palmer , Amy H Herring , David B Dunson

Ann Appl Stat

March 2025

Developmental epidemiology commonly focuses on assessing the association between multiple early life exposures and childhood health. Statistical analyses of data from such studies focus on inferring the contributions of individual exposures, while also characterizing time-varying and interacting effects. Such inferences are made more challenging by correlations among exposures, nonlinearity, and the curse of dimensionality.

View Article and Find Full Text PDF

Radial Neighbors for Provably Accurate Scalable Approximations of Gaussian Processes.

Yichen Zhu , Michele Peruzzi , Cheng Li , David B Dunson

Biometrika

December 2024

In geostatistical problems with massive sample size, Gaussian processes can be approximated using sparse directed acyclic graphs to achieve scalable computational complexity. In these models, data at each location are typically assumed conditionally dependent on a small set of parents which usually include a subset of the nearest neighbors. These methodologies often exhibit excellent empirical performance, but the lack of theoretical validation leads to unclear guidance in specifying the underlying graphical model and sensitivity to graph choice.

View Article and Find Full Text PDF

Bayesian semiparametric inference in longitudinal metabolomics data.

Abhra Sarkar , Ornella Cominetti , Ivan Montoliu , Joanne Hosking , Jonathan Pinkney , David B Dunson

Sci Rep

December 2024

The article is motivated by an application to the EarlyBird cohort study aiming to explore how anthropometrics and clinical and metabolic processes are associated with obesity and glucose control during childhood. There is interest in inferring the relationship between dynamically changing and high-dimensional metabolites and a longitudinal response. Important aspects of the analysis include the selection of the important set of metabolites and the accommodation of missing data in both response and covariate values.

View Article and Find Full Text PDF

Brain network fingerprints of Alzheimer's disease risk factors in mouse models with humanized APOE alleles.

Steven Winter , Ali Mahzarnia , Robert J Anderson , Zay Yar Han , Jessica Tremblay , David B Dunson

Magn Reson Imaging

December 2024

Alzheimer's disease (AD) presents complex challenges due to its multifactorial nature, poorly understood etiology, and late detection. The mechanisms through which genetic and modifiable risk factors influence disease susceptibility are under intense investigation, with APOE being the major genetic risk factor for late onset AD. Yet the impact of unique risk factors on brain networks is difficult to disentangle, and their interactions remain unclear.

View Article and Find Full Text PDF

APOE, Immune Factors, Sex, and Diet Interact to Shape Brain Networks in Mouse Models of Aging.

Steven Winter , Ali Mahzarnia , Robert J Anderson , Zay Yar Han , Jessica Tremblay , David B Dunson

bioRxiv

July 2024

Unlabelled: Alzheimer's disease (AD) presents complex challenges due to its multifactorial nature, poorly understood etiology, and late detection. The mechanisms through which genetic, fixed and modifiable risk factors influence susceptibility to AD are under intense investigation, yet the impact of unique risk factors on brain networks is difficult to disentangle, and their interactions remain unclear. To model multiple risk factors including APOE genotype, age, sex, diet, and immunity we leveraged mice expressing the human APOE and NOS2 genes, conferring a reduced immune response compared to mouse Nos2.

View Article and Find Full Text PDF

Spatial meshing for general Bayesian multivariate models.

Michele Peruzzi , David B Dunson

J Mach Learn Res

March 2024

Quantifying spatial and/or temporal associations in multivariate geolocated data of different types is achievable via spatial random effects in a Bayesian hierarchical model, but severe computational bottlenecks arise when spatial dependence is encoded as a latent Gaussian process (GP) in the increasingly common large scale data settings on which we focus. The scenario worsens in non-Gaussian models because the reduced analytical tractability leads to additional hurdles to computational efficiency. In this article, we introduce Bayesian models of spatially referenced data in which the likelihood or the latent process (or both) are not Gaussian.

View Article and Find Full Text PDF

Ellipsoid fitting with the Cayley transform.

Omar Melikechi , David B Dunson

IEEE Trans Signal Process

November 2023

We introduce Cayley transform ellipsoid fitting (CTEF), an algorithm that uses the Cayley transform to fit ellipsoids to noisy data in any dimension. Unlike many ellipsoid fitting methods, CTEF is ellipsoid specific, meaning it always returns elliptic solutions, and can fit arbitrary ellipsoids. It also significantly outperforms other fitting methods when data are not uniformly distributed over the surface of an ellipsoid.

View Article and Find Full Text PDF

Bayesian inference on high-dimensional multivariate binary responses.

Antik Chakraborty , Rihui Ou , David B Dunson

J Am Stat Assoc

November 2023

It has become increasingly common to collect high-dimensional binary response data; for example, with the emergence of new sampling techniques in ecology. In smaller dimensions, multivariate probit (MVP) models are routinely used for inferences. However, algorithms for fitting such models face issues in scaling up to high dimensions due to the intractability of the likelihood, involving an integral over a multivariate normal distribution having no analytic form.

View Article and Find Full Text PDF

A generalized Bayes framework for probabilistic clustering.

Tommaso Rigon , Amy H Herring , David B Dunson

Biometrika

September 2023

Loss-based clustering methods, such as k-means clustering and its variants, are standard tools for finding groups in data. However, the lack of quantification of uncertainty in the estimated clusters is a disadvantage. Model-based clustering based on mixture models provides an alternative approach, but such methods face computational problems and are highly sensitive to the choice of kernel.

View Article and Find Full Text PDF

Explaining transmission rate variations and forecasting epidemic spread in multiple regions with a semiparametric mixed effects SIR model.

David A Buch , James E Johndrow , David B Dunson

Biometrics

December 2023

The transmission rate is a central parameter in mathematical models of infectious disease. Its pivotal role in outbreak dynamics makes estimating the current transmission rate and uncovering its dependence on relevant covariates a core challenge in epidemiological research as well as public health policy evaluation. Here, we develop a method for flexibly inferring a time-varying transmission rate parameter, modeled as a function of covariates and a smooth Gaussian process (GP).

View Article and Find Full Text PDF

PPA: Principal parcellation analysis for brain connectomes and multiple traits.

Rongjie Liu , Meng Li , David B Dunson

Neuroimage

August 2023

Our understanding of the structure of the brain and its relationships with human traits is largely determined by how we represent the structural connectome. Standard practice divides the brain into regions of interest (ROIs) and represents the connectome as an adjacency matrix having cells measuring connectivity between pairs of ROIs. Statistical analyses are then heavily driven by the (largely arbitrary) choice of ROIs.

View Article and Find Full Text PDF

Bayesian matrix completion for hypothesis testing.

Bora Jin , David B Dunson , Julia E Rager , David M Reif , Stephanie M Engel

J R Stat Soc Ser C Appl Stat

May 2023

We aim to infer bioactivity of each chemical by assay endpoint combination, addressing sparsity of toxicology data. We propose a Bayesian hierarchical framework which borrows information across different chemicals and assay endpoints, facilitates out-of-sample prediction of activity for chemicals not yet assayed, quantifies uncertainty of predicted activity, and adjusts for multiplicity in hypothesis testing. Furthermore, this paper makes a novel attempt in toxicology to simultaneously model heteroscedastic errors and a nonparametric mean function, leading to a broader definition of activity whose need has been suggested by toxicologists.

View Article and Find Full Text PDF

Mutual information: Measuring nonlinear dependence in longitudinal epidemiological data.

Alexander L Young , Willem van den Boom , Rebecca A Schroeder , Vijay Krishnamoorthy , Karthik Raghunathan , David B Dunson

PLoS One

April 2023

Given a large clinical database of longitudinal patient information including many covariates, it is computationally prohibitive to consider all types of interdependence between patient variables of interest. This challenge motivates the use of mutual information (MI), a statistical summary of data interdependence with appealing properties that make it a suitable alternative or addition to correlation for identifying relationships in data. MI: (i) captures all types of dependence, both linear and nonlinear, (ii) is zero only when random variables are independent, (iii) serves as a measure of relationship strength (similar to but more general than R2), and (iv) is interpreted the same way for numerical and categorical data.

View Article and Find Full Text PDF

Escaping The Curse of Dimensionality in Bayesian Model-Based Clustering.

Noirrit Kiran Chandra , Antonio Canale , David B Dunson

J Mach Learn Res

April 2023

Bayesian mixture models are widely used for clustering of high-dimensional data with appropriate uncertainty quantification. However, as the dimension of the observations increases, posterior inference often tends to favor too many or too few clusters. This article explains this behavior by studying the random partition posterior in a non-standard setting with a fixed sample size and increasing data dimensionality.

View Article and Find Full Text PDF

Dimension-Grouped Mixed Membership Models for Multivariate Categorical Data.

Yuqi Gu , Elena A Erosheva , Gongjun Xu , David B Dunson

J Mach Learn Res

February 2023

Mixed Membership Models (MMMs) are a popular family of latent structure models for complex multivariate data. Instead of forcing each subject to belong to a single cluster, MMMs incorporate a vector of subject-specific weights characterizing partial membership across clusters. With this flexibility come challenges in uniquely identifying, estimating, and interpreting the parameters.

View Article and Find Full Text PDF

Bayesian Modeling of Sequential Discoveries.

Alessandro Zito , Tommaso Rigon , Otso Ovaskainen , David B Dunson

J Am Stat Assoc

May 2022

We aim at modeling the appearance of distinct tags in a sequence of labeled objects. Common examples of this type of data include words in a corpus or distinct species in a sample. These sequential discoveries are often summarized via accumulation curves, which count the number of distinct entities observed in an increasingly large set of objects.

View Article and Find Full Text PDF

Corrigendum: Absolute winding number differentiates mouse spatial navigation strategies with genetic risk for Alzheimer's disease.

Alexandra Badea , Didong Li , Andrei R Niculescu , Robert J Anderson , Jacques A Stout , David B Dunson

Front Neurosci

November 2022

[This corrects the article DOI: 10.3389/fnins.2022.

View Article and Find Full Text PDF

BAYESIAN SEMIPARAMETRIC LONG MEMORY MODELS FOR DISCRETIZED EVENT DATA.

Antik Chakraborty , Otso Ovaskainen , David B Dunson

Ann Appl Stat

September 2022

We introduce a new class of semiparametric latent variable models for long memory discretized event data. The proposed methodology is motivated by a study of bird vocalizations in the Amazon rain forest; the timings of vocalizations exhibit self-similarity and long range dependence. This rules out Poisson process based models where the rate function itself is not long range dependent.

View Article and Find Full Text PDF

EXTENDED STOCHASTIC BLOCK MODELS WITH APPLICATION TO CRIMINAL NETWORKS.

Sirio Legramanti , Tommaso Rigon , Daniele Durante , David B Dunson

Ann Appl Stat

December 2022

Reliably learning group structures among nodes in network data is challenging in several applications. We are particularly motivated by studying covert networks that encode relationships among criminals. These data are subject to measurement errors, and exhibit a complex combination of an unknown number of core-periphery, assortative and disassortative structures that may unveil key architectures of the criminal organization.

View Article and Find Full Text PDF

Identifying vulnerable brain networks associated with Alzheimer's disease risk.

Ali Mahzarnia , Jacques A Stout , Robert J Anderson , Hae Sol Moon , Zay Yar Han , David B Dunson

Cereb Cortex

April 2023

The selective vulnerability of brain networks in individuals at risk for Alzheimer's disease (AD) may help differentiate pathological from normal aging at asymptomatic stages, allowing the implementation of more effective interventions. We used a sample of 72 people across the age span, enriched for the APOE4 genotype to reveal vulnerable networks associated with a composite AD risk factor including age, genotype, and sex. Sparse canonical correlation analysis (CCA) revealed a high weight associated with genotype, and subgraphs involving the cuneus, temporal, cingulate cortices, and cerebellum.

View Article and Find Full Text PDF

Spatial Multivariate Trees for Big Data Bayesian Regression.

Michele Peruzzi , David B Dunson

J Mach Learn Res

January 2022

High resolution geospatial data are challenging because standard geostatistical models based on Gaussian processes are known to not scale to large data sizes. While progress has been made towards methods that can be computed more efficiently, considerably less attention has been devoted to methods for large scale data that allow the description of complex relationships between several outcomes recorded at high resolutions by different sensors. Our Bayesian multivariate regression models based on spatial multivariate trees (SpamTrees) achieve scalability via conditional independence assumptions on latent random effects following a treed directed acyclic graph.

View Article and Find Full Text PDF