Publications by Benjamin B Chu

Publications by authors named "Benjamin B Chu"

Page 1 of 1

It's a wrap: deriving distinct discoveries with FDR control after a GWAS pipeline.

Benjamin B Chu , Zihuai He , Chiara Sabatti

bioRxiv

July 2025

The standard analysis pipeline for genome-wide association studies (GWAS) is based on marginal tests of association. These are computationally convenient and portable, but the discoveries resulting from their rejections are not immediately interpretable, and require post-processing as "clumping" and "fine mapping." An interesting alternative is provided by conditional independence hypotheses: their rejections lead to the identification of distinct signals across the genome, accounting for measured confounders, and pointing to separate causal pathways.

View Article and Find Full Text PDF

Second-order group knockoffs with applications to genome-wide association studies.

Benjamin B Chu , Jiaqi Gu , Zhaomeng Chen , Tim Morrison , Emmanuel Candès

Bioinformatics

October 2024

Motivation: Conditional testing via the knockoff framework allows one to identify-among a large number of possible explanatory variables-those that carry unique information about an outcome of interest and also provides a false discovery rate guarantee on the selection. This approach is particularly well suited to the analysis of genome-wide association studies (GWAS), which have the goal of identifying genetic variants that influence traits of medical relevance.

Results: While conditional testing can be both more powerful and precise than traditional GWAS analysis methods, its vanilla implementation encounters a difficulty common to all multivariate analysis methods: it is challenging to distinguish among multiple, highly correlated regressors.

View Article and Find Full Text PDF

A blended genome and exome sequencing method captures genetic variation in an unbiased, high-quality, and cost-effective manner.

Toni A Boltz , Benjamin B Chu , Calwing Liao , Julia M Sealock , Robert Ye , Stella Gichuru , Christopher Kachulis

bioRxiv

September 2024

We deployed the Blended Genome Exome (BGE), a DNA library blending approach that generates low pass whole genome (1-4× mean depth) and deep whole exome (30-40× mean depth) data in a single sequencing run. This technology is cost-effective, empowers most genomic discoveries possible with deep whole genome sequencing, and provides an unbiased method to capture the diversity of common SNP variation across the globe. To evaluate this new technology at scale, we applied BGE to sequence >53,000 samples from the Populations Underrepresented in Mental Illness Associations Studies (PUMAS) Project, which included participants across African, African American, and Latin American populations.

View Article and Find Full Text PDF

Second-order group knockoffs with applications to GWAS.

Benjamin B Chu , Jiaqi Gu , Zhaomeng Chen , Tim Morrison , Emmanuel Candes

ArXiv

March 2024

Conditional testing via the knockoff framework allows one to identify -- among large number of possible explanatory variables -- those that carry unique information about an outcome of interest, and also provides a false discovery rate guarantee on the selection. This approach is particularly well suited to the analysis of genome wide association studies (GWAS), which have the goal of identifying genetic variants which influence traits of medical relevance. While conditional testing can be both more powerful and precise than traditional GWAS analysis methods, its vanilla implementation encounters a difficulty common to all multivariate analysis methods: it is challenging to distinguish among multiple, highly correlated regressors.

View Article and Find Full Text PDF

Controlled Variable Selection from Summary Statistics Only? A Solution via GhostKnockoffs and Penalized Regression.

Zhaomeng Chen , Zihuai He , Benjamin B Chu , Jiaqi Gu , Tim Morrison

ArXiv

February 2024

Identifying which variables do influence a response while controlling false positives pervades statistics and data science. In this paper, we consider a scenario in which we only have access to summary statistics, such as the values of marginal empirical correlations between each dependent variable of potential interest and the response. This situation may arise due to privacy concerns, e.

View Article and Find Full Text PDF

Multivariate genome-wide association analysis by iterative hard thresholding.

Benjamin B Chu , Seyoon Ko , Jin J Zhou , Aubrey Jensen , Hua Zhou

Bioinformatics

April 2023

Article Synopsis

Analyzing multiple correlated traits in genome-wide association studies is more effective than analyzing them individually, but traditional methods are computationally heavy.
A new algorithm called MendelIHT, implemented in a Julia package, uses iterative hard thresholding to improve efficiency and accuracy in these analyses, outperforming existing methods like GEMMA and mv-PLINK in speed and error rates.
The software and its documentation are freely available online, enabling researchers to conduct large-scale trait analyses with significant reductions in computation time and resource usage.

View Article and Find Full Text PDF

Unsupervised discovery of ancestry-informative markers and genetic admixture proportions in biobank-scale datasets.

Seyoon Ko , Benjamin B Chu , Daniel Peterson , Chidera Okenwa , Jeanette C Papp

Am J Hum Genet

February 2023

Admixture estimation plays a crucial role in ancestry inference and genome-wide association studies (GWASs). Computer programs such as ADMIXTURE and STRUCTURE are commonly employed to estimate the admixture proportions of sample individuals. However, these programs can be overwhelmed by the computational burdens imposed by the 10 to 10 samples and millions of markers commonly found in modern biobanks.

View Article and Find Full Text PDF

Optimized Replacement T4 and T4+T3 Dosing in Male and Female Hypothyroid Patients With Different BMIs Using a Personalized Mechanistic Model of Thyroid Hormone Regulation Dynamics.

Mauricio Cruz-Loya , Benjamin B Chu , Jacqueline Jonklaas , David F Schneider , Joseph DiStefano

Front Endocrinol (Lausanne)

August 2022

Objective: A personalized simulation tool, p-THYROSIM, was developed (1) to better optimize replacement LT4 and LT4+LT3 dosing for hypothyroid patients, based on individual hormone levels, BMIs, and gender; and (2) to better understand how gender and BMI impact thyroid dynamical regulation over time in these patients.

Methods: p-THYROSIM was developed by (1) modifying and refining THYROSIM, an established physiologically based mechanistic model of the system regulating serum T3, T4, and TSH level dynamics; (2) incorporating sex and BMI of individual patients into the model; and (3) quantifying it with 3 experimental datasets and validating it with a fourth containing data from distinct male and female patients across a wide range of BMIs. For validation, we compared our optimized predictions with previously published results on optimized LT4 monotherapies.

View Article and Find Full Text PDF

A fast data-driven method for genotype imputation, phasing and local ancestry inference: MendelImpute.jl.

Benjamin B Chu , Eric M Sobel , Rory Wasiolek , Seyoon Ko , Janet S Sinsheimer

Bioinformatics

December 2021

Motivation: Current methods for genotype imputation and phasing exploit the volume of data in haplotype reference panels and rely on hidden Markov models (HMMs). Existing programs all have essentially the same imputation accuracy, are computationally intensive and generally require prephasing the typed markers.

Results: We introduce a novel data-mining method for genotype imputation and phasing that substitutes highly efficient linear algebra routines for HMM calculations.

View Article and Find Full Text PDF

Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity.

Benjamin B Chu , Kevin L Keys , Christopher A German , Hua Zhou , Jin J Zhou

Gigascience

June 2020

Background: Consecutive testing of single nucleotide polymorphisms (SNPs) is usually employed to identify genetic variants associated with complex traits. Ideally one should model all covariates in unison, but most existing analysis methods for genome-wide association studies (GWAS) perform only univariate regression.

Results: We extend and efficiently implement iterative hard thresholding (IHT) for multiple regression, treating all SNPs simultaneously.

View Article and Find Full Text PDF

OPENMENDEL: a cooperative programming project for statistical genetics.

Hua Zhou , Janet S Sinsheimer , Douglas M Bates , Benjamin B Chu , Christopher A German

Hum Genet

January 2020

Statistical methods for genome-wide association studies (GWAS) continue to improve. However, the increasing volume and variety of genetic and genomic data make computational speed and ease of data manipulation mandatory in future software. In our view, a collaborative effort of statistical geneticists is required to develop open source software targeted to genetic epidemiology.

View Article and Find Full Text PDF