Publications by authors named "Benjamin B Chu"

The standard analysis pipeline for genome-wide association studies (GWAS) is based on marginal tests of association. These are computationally convenient and portable, but the discoveries resulting from their rejections are not immediately interpretable, and require post-processing as "clumping" and "fine mapping." An interesting alternative is provided by conditional independence hypotheses: their rejections lead to the identification of distinct signals across the genome, accounting for measured confounders, and pointing to separate causal pathways.

View Article and Find Full Text PDF

Motivation: Conditional testing via the knockoff framework allows one to identify-among a large number of possible explanatory variables-those that carry unique information about an outcome of interest and also provides a false discovery rate guarantee on the selection. This approach is particularly well suited to the analysis of genome-wide association studies (GWAS), which have the goal of identifying genetic variants that influence traits of medical relevance.

Results: While conditional testing can be both more powerful and precise than traditional GWAS analysis methods, its vanilla implementation encounters a difficulty common to all multivariate analysis methods: it is challenging to distinguish among multiple, highly correlated regressors.

View Article and Find Full Text PDF

We deployed the Blended Genome Exome (BGE), a DNA library blending approach that generates low pass whole genome (1-4× mean depth) and deep whole exome (30-40× mean depth) data in a single sequencing run. This technology is cost-effective, empowers most genomic discoveries possible with deep whole genome sequencing, and provides an unbiased method to capture the diversity of common SNP variation across the globe. To evaluate this new technology at scale, we applied BGE to sequence >53,000 samples from the Populations Underrepresented in Mental Illness Associations Studies (PUMAS) Project, which included participants across African, African American, and Latin American populations.

View Article and Find Full Text PDF

Conditional testing via the knockoff framework allows one to identify -- among large number of possible explanatory variables -- those that carry unique information about an outcome of interest, and also provides a false discovery rate guarantee on the selection. This approach is particularly well suited to the analysis of genome wide association studies (GWAS), which have the goal of identifying genetic variants which influence traits of medical relevance. While conditional testing can be both more powerful and precise than traditional GWAS analysis methods, its vanilla implementation encounters a difficulty common to all multivariate analysis methods: it is challenging to distinguish among multiple, highly correlated regressors.

View Article and Find Full Text PDF

Identifying which variables do influence a response while controlling false positives pervades statistics and data science. In this paper, we consider a scenario in which we only have access to summary statistics, such as the values of marginal empirical correlations between each dependent variable of potential interest and the response. This situation may arise due to privacy concerns, e.

View Article and Find Full Text PDF
Article Synopsis
  • Analyzing multiple correlated traits in genome-wide association studies is more effective than analyzing them individually, but traditional methods are computationally heavy.
  • A new algorithm called MendelIHT, implemented in a Julia package, uses iterative hard thresholding to improve efficiency and accuracy in these analyses, outperforming existing methods like GEMMA and mv-PLINK in speed and error rates.
  • The software and its documentation are freely available online, enabling researchers to conduct large-scale trait analyses with significant reductions in computation time and resource usage.
View Article and Find Full Text PDF

Admixture estimation plays a crucial role in ancestry inference and genome-wide association studies (GWASs). Computer programs such as ADMIXTURE and STRUCTURE are commonly employed to estimate the admixture proportions of sample individuals. However, these programs can be overwhelmed by the computational burdens imposed by the 10 to 10 samples and millions of markers commonly found in modern biobanks.

View Article and Find Full Text PDF

Objective: A personalized simulation tool, p-THYROSIM, was developed (1) to better optimize replacement LT4 and LT4+LT3 dosing for hypothyroid patients, based on individual hormone levels, BMIs, and gender; and (2) to better understand how gender and BMI impact thyroid dynamical regulation over time in these patients.

Methods: p-THYROSIM was developed by (1) modifying and refining THYROSIM, an established physiologically based mechanistic model of the system regulating serum T3, T4, and TSH level dynamics; (2) incorporating sex and BMI of individual patients into the model; and (3) quantifying it with 3 experimental datasets and validating it with a fourth containing data from distinct male and female patients across a wide range of BMIs. For validation, we compared our optimized predictions with previously published results on optimized LT4 monotherapies.

View Article and Find Full Text PDF

Motivation: Current methods for genotype imputation and phasing exploit the volume of data in haplotype reference panels and rely on hidden Markov models (HMMs). Existing programs all have essentially the same imputation accuracy, are computationally intensive and generally require prephasing the typed markers.

Results: We introduce a novel data-mining method for genotype imputation and phasing that substitutes highly efficient linear algebra routines for HMM calculations.

View Article and Find Full Text PDF

Background: Consecutive testing of single nucleotide polymorphisms (SNPs) is usually employed to identify genetic variants associated with complex traits. Ideally one should model all covariates in unison, but most existing analysis methods for genome-wide association studies (GWAS) perform only univariate regression.

Results: We extend and efficiently implement iterative hard thresholding (IHT) for multiple regression, treating all SNPs simultaneously.

View Article and Find Full Text PDF

Statistical methods for genome-wide association studies (GWAS) continue to improve. However, the increasing volume and variety of genetic and genomic data make computational speed and ease of data manipulation mandatory in future software. In our view, a collaborative effort of statistical geneticists is required to develop open source software targeted to genetic epidemiology.

View Article and Find Full Text PDF