Publications by authors named "David Dunson"

Factor analysis provides a canonical framework for imposing lower-dimensional structure such as sparse covariance in high-dimensional data. High-dimensional data on the same set of variables are often collected under different conditions, for instance in reproducing studies across research groups. In such cases, it is natural to seek to learn the shared versus condition-specific structure.

View Article and Find Full Text PDF

There is abundant interest in assessing the joint effects of multiple exposures on human health. This is often referred to as the mixtures problem in environmental epidemiology and toxicology. Classically, studies have examined the adverse health effects of different chemicals one at a time, but there is concern that certain chemicals may act together to amplify each other's effects.

View Article and Find Full Text PDF

While there is an immense literature on Bayesian methods for clustering, the multiview case has received little attention. This problem focuses on obtaining distinct but statistically dependent clusterings in a common set of entities for different data types. For example, clustering patients into subgroups with subgroup membership varying according to the domain of the patient variables.

View Article and Find Full Text PDF

We propose a class of nonstationary processes to characterize space- and time-varying directional associations in point-referenced data. We are motivated by spatiotemporal modeling of air pollutants in which local wind patterns are key determinants of the pollutant spread, but information regarding prevailing wind directions may be missing or unreliable. We propose to map a discrete set of wind directions to edges in a sparse directed acyclic graph (DAG), accounting for uncertainty in directional correlation patterns across a domain.

View Article and Find Full Text PDF

Mapping of human brain structural connectomes via diffusion magnetic resonance imaging (dMRI) offers a unique opportunity to understand brain structural connectivity and relate it to various human traits, such as cognition. However, head displacement during image acquisition can compromise the accuracy of connectome reconstructions and subsequent inference results. We develop a generative model to learn low-dimensional representations of structural connectomes invariant to motion-induced artifacts, so that we can link brain networks and human traits more accurately, and generate motion-adjusted connectomes.

View Article and Find Full Text PDF

Background: Human exposure to complex, changing, and variably correlated mixtures of environmental chemicals has presented analytical challenges to epidemiologists and human health researchers. There has been a wide variety of recent advances in statistical methods for analyzing mixtures data, with most methods having open-source software for implementation. However, there is no one-size-fits-all method for analyzing mixtures data given the considerable heterogeneity in scientific focus and study design.

View Article and Find Full Text PDF

Motivation: Feature selection is a critical task in machine learning and statistics. However, existing feature selection methods either (i) rely on parametric methods such as linear or generalized linear models, (ii) lack theoretical false discovery control, or (iii) identify few true positives.

Results: We introduce a general feature selection method with finite-sample false discovery control based on applying integrated path stability selection (IPSS) to arbitrary feature importance scores.

View Article and Find Full Text PDF

Developmental epidemiology commonly focuses on assessing the association between multiple early life exposures and childhood health. Statistical analyses of data from such studies focus on inferring the contributions of individual exposures, while also characterizing time-varying and interacting effects. Such inferences are made more challenging by correlations among exposures, nonlinearity, and the curse of dimensionality.

View Article and Find Full Text PDF

In geostatistical problems with massive sample size, Gaussian processes can be approximated using sparse directed acyclic graphs to achieve scalable computational complexity. In these models, data at each location are typically assumed conditionally dependent on a small set of parents which usually include a subset of the nearest neighbors. These methodologies often exhibit excellent empirical performance, but the lack of theoretical validation leads to unclear guidance in specifying the underlying graphical model and sensitivity to graph choice.

View Article and Find Full Text PDF

Human exposure to complex, changing, and variably correlated mixtures of environmental chemicals has presented analytical challenges to epidemiologists and human health researchers. There have been a wide variety of recent advances in statistical methods for analyzing mixtures data, with most of these methods having open-source software for implementation. However, there is no one-size-fits-all method for analyzing mixtures data given the considerable heterogeneity in scientific focus and study design.

View Article and Find Full Text PDF

The article is motivated by an application to the EarlyBird cohort study aiming to explore how anthropometrics and clinical and metabolic processes are associated with obesity and glucose control during childhood. There is interest in inferring the relationship between dynamically changing and high-dimensional metabolites and a longitudinal response. Important aspects of the analysis include the selection of the important set of metabolites and the accommodation of missing data in both response and covariate values.

View Article and Find Full Text PDF

Alzheimer's disease (AD) presents complex challenges due to its multifactorial nature, poorly understood etiology, and late detection. The mechanisms through which genetic and modifiable risk factors influence disease susceptibility are under intense investigation, with APOE being the major genetic risk factor for late onset AD. Yet the impact of unique risk factors on brain networks is difficult to disentangle, and their interactions remain unclear.

View Article and Find Full Text PDF

Unlabelled: Alzheimer's disease (AD) presents complex challenges due to its multifactorial nature, poorly understood etiology, and late detection. The mechanisms through which genetic, fixed and modifiable risk factors influence susceptibility to AD are under intense investigation, yet the impact of unique risk factors on brain networks is difficult to disentangle, and their interactions remain unclear. To model multiple risk factors including APOE genotype, age, sex, diet, and immunity we leveraged mice expressing the human APOE and NOS2 genes, conferring a reduced immune response compared to mouse Nos2.

View Article and Find Full Text PDF

In this paper we predict sea surface salinity (SSS) in the Arctic Ocean based on satellite measurements. SSS is a crucial indicator for ongoing changes in the Arctic Ocean and can offer important insights about climate change. We particularly focus on areas of water mistakenly flagged as ice by satellite algorithms.

View Article and Find Full Text PDF

Quantifying spatial and/or temporal associations in multivariate geolocated data of different types is achievable via spatial random effects in a Bayesian hierarchical model, but severe computational bottlenecks arise when spatial dependence is encoded as a latent Gaussian process (GP) in the increasingly common large scale data settings on which we focus. The scenario worsens in non-Gaussian models because the reduced analytical tractability leads to additional hurdles to computational efficiency. In this article, we introduce Bayesian models of spatially referenced data in which the likelihood or the latent process (or both) are not Gaussian.

View Article and Find Full Text PDF

Throughout the course of an epidemic, the rate at which disease spreads varies with behavioral changes, the emergence of new disease variants, and the introduction of mitigation policies. Estimating such changes in transmission rates can help us better model and predict the dynamics of an epidemic, and provide insight into the efficacy of control and intervention strategies. We present a method for likelihood-based estimation of parameters in the stochastic susceptible-infected-removed model under a time-inhomogeneous transmission rate comprised of piecewise constant components.

View Article and Find Full Text PDF

We introduce Cayley transform ellipsoid fitting (CTEF), an algorithm that uses the Cayley transform to fit ellipsoids to noisy data in any dimension. Unlike many ellipsoid fitting methods, CTEF is ellipsoid specific, meaning it always returns elliptic solutions, and can fit arbitrary ellipsoids. It also significantly outperforms other fitting methods when data are not uniformly distributed over the surface of an ellipsoid.

View Article and Find Full Text PDF

It has become increasingly common to collect high-dimensional binary response data; for example, with the emergence of new sampling techniques in ecology. In smaller dimensions, multivariate probit (MVP) models are routinely used for inferences. However, algorithms for fitting such models face issues in scaling up to high dimensions due to the intractability of the likelihood, involving an integral over a multivariate normal distribution having no analytic form.

View Article and Find Full Text PDF

The brain structural connectome is generated by a collection of white matter fiber bundles constructed from diffusion weighted MRI (dMRI), acting as highways for neural activity. There has been abundant interest in studying how the structural connectome varies across individuals in relation to their traits, ranging from age and gender to neuropsychiatric outcomes. After applying tractography to dMRI to get white matter fiber bundles, a key question is how to represent the brain connectome to facilitate statistical analyses relating connectomes to traits.

View Article and Find Full Text PDF

Targeted brain stimulation has the potential to treat mental illnesses. We develop an approach to help design protocols by identifying relevant multi-region electrical dynamics. Our approach models these dynamics as a superposition of latent networks, where the latent variables predict a relevant outcome.

View Article and Find Full Text PDF

Loss-based clustering methods, such as k-means clustering and its variants, are standard tools for finding groups in data. However, the lack of quantification of uncertainty in the estimated clusters is a disadvantage. Model-based clustering based on mixture models provides an alternative approach, but such methods face computational problems and are highly sensitive to the choice of kernel.

View Article and Find Full Text PDF

The transmission rate is a central parameter in mathematical models of infectious disease. Its pivotal role in outbreak dynamics makes estimating the current transmission rate and uncovering its dependence on relevant covariates a core challenge in epidemiological research as well as public health policy evaluation. Here, we develop a method for flexibly inferring a time-varying transmission rate parameter, modeled as a function of covariates and a smooth Gaussian process (GP).

View Article and Find Full Text PDF

Our understanding of the structure of the brain and its relationships with human traits is largely determined by how we represent the structural connectome. Standard practice divides the brain into regions of interest (ROIs) and represents the connectome as an adjacency matrix having cells measuring connectivity between pairs of ROIs. Statistical analyses are then heavily driven by the (largely arbitrary) choice of ROIs.

View Article and Find Full Text PDF

We aim to infer bioactivity of each chemical by assay endpoint combination, addressing sparsity of toxicology data. We propose a Bayesian hierarchical framework which borrows information across different chemicals and assay endpoints, facilitates out-of-sample prediction of activity for chemicals not yet assayed, quantifies uncertainty of predicted activity, and adjusts for multiplicity in hypothesis testing. Furthermore, this paper makes a novel attempt in toxicology to simultaneously model heteroscedastic errors and a nonparametric mean function, leading to a broader definition of activity whose need has been suggested by toxicologists.

View Article and Find Full Text PDF

Given a large clinical database of longitudinal patient information including many covariates, it is computationally prohibitive to consider all types of interdependence between patient variables of interest. This challenge motivates the use of mutual information (MI), a statistical summary of data interdependence with appealing properties that make it a suitable alternative or addition to correlation for identifying relationships in data. MI: (i) captures all types of dependence, both linear and nonlinear, (ii) is zero only when random variables are independent, (iii) serves as a measure of relationship strength (similar to but more general than R2), and (iv) is interpreted the same way for numerical and categorical data.

View Article and Find Full Text PDF