Publications by authors named "Benjamin J Strober"

Complex diseases often have distinct mechanisms spanning multiple tissues. We propose tissue-gene fine-mapping (TGFM), which infers the posterior inclusion probability (PIP) for each gene-tissue pair to mediate a disease locus by analyzing summary statistics and expression quantitative trait loci (eQTL) data; TGFM also assigns PIPs to non-mediated variants. TGFM accounts for co-regulation across genes and tissues and models uncertainty in cis-predicted expression models, enabling correct calibration.

View Article and Find Full Text PDF
Article Synopsis
  • Leveraging data from different ancestries enhances fine-mapping power by accounting for variations in allele frequencies and linkage disequilibrium.
  • The proposed method, MultiSuSiE, extends the traditional SuSiE model to analyze multiple ancestries, allowing for varying causal effect sizes based on empirical data using a multivariate normal prior.
  • In simulations and analyses of large-scale genomic data, MultiSuSiE showed higher power and lower computational costs than previous methods, successfully identifying more fine-mapped variants and validating these findings through functional enrichment analyses.
View Article and Find Full Text PDF
Article Synopsis
  • Genetic regulation of gene expression varies based on different cell types and environmental contexts, making it a complex process.* -
  • SURGE is a new method developed to discover context-specific expression quantitative trait loci (eQTLs) from single-cell transcriptomic data without needing prior information.* -
  • When applied to peripheral blood data, SURGE effectively identifies specific cell types and their relationships, demonstrating its relevance to diseases through advanced analysis methods.*
View Article and Find Full Text PDF
Article Synopsis
  • The study introduces a new method called LDSPEC to estimate the relationship between causal disease effect sizes of nearby SNPs, challenging the assumption that they are independent.
  • It analyzes data from 70 diseases in the UK Biobank, discovering significant correlations in effect sizes among proximal SNP pairs, which vary based on different factors such as distance and allele frequency.
  • The research finds that SNP pairs with related functions show stronger correlations extending over longer genomic distances, and it reveals that SNP-heritability estimates are lower than previously thought, indicating a discrepancy between expected and real genetic contributions to diseases.
View Article and Find Full Text PDF
Article Synopsis
  • The study investigates the relationships between causal disease effect sizes of proximal SNPs (single nucleotide polymorphisms) using a new method called LDSPEC, suggesting that these SNPs are not independent as previously thought.
  • By applying LDSPEC to data from 70 diseases in the UK Biobank, researchers found that the correlations in effect sizes between nearby SNPs varied based on distance, allele frequency, and linkage disequilibrium (LD), indicating complex interactions.
  • The results reveal that SNP pairs with shared functions show stronger correlations over longer distances, leading to a significant discrepancy between SNP-heritability estimates and the total variance of causal effect sizes, challenging prior assumptions in genetic research.
View Article and Find Full Text PDF
Article Synopsis
  • TGFM is a new method for identifying the relationship between specific genes, tissues, and heritable diseases by analyzing genetic data from diverse tissues and GWAS summary statistics.
  • It improves on existing fine-mapping methods by considering co-regulation across genes, the linkage disequilibrium between SNPs, and incorporating tissue-level contributions to disease.
  • When applied to diseases in the UK Biobank, TGFM successfully identified numerous causal genetic elements and gene-tissue pairs, validating known biology while also revealing novel associations relevant to various diseases.
View Article and Find Full Text PDF
Article Synopsis
  • Identifying impactful rare genetic variants is difficult, but using personal multi-omics can help overcome this challenge, as shown in a study involving several hundred individuals over 10 years.
  • By analyzing whole-genome sequencing and other omics data, researchers found that combining expression and protein data significantly increased the detection of rare stop and frameshift variants.
  • A new Bayesian hierarchical model called "Watershed" was used to prioritize rare variants linked to significant traits, revealing variants that influence complex conditions like height, schizophrenia, and Alzheimer's disease.
View Article and Find Full Text PDF

Differential allele-specific expression (ASE) is a powerful tool to study context-specific cis-regulation of gene expression. Such effects can reflect the interaction between genetic or epigenetic factors and a measured context or condition. Single-cell RNA sequencing (scRNA-seq) allows the measurement of ASE at individual-cell resolution, but there is a lack of statistical methods to analyze such data.

View Article and Find Full Text PDF

Practically all studies of gene expression in humans to date have been performed in a relatively small number of adult tissues. Gene regulation is highly dynamic and context-dependent. In order to better understand the connection between gene regulation and complex phenotypes, including disease, we need to be able to study gene expression in more cell types, tissues, and states that are relevant to human phenotypes.

View Article and Find Full Text PDF

Uncovering the functional impact of genetic variation on gene expression is important in understanding tissue biology and the pathogenesis of complex traits. Despite large efforts to map expression quantitative trait loci (eQTLs) across many human tissues, our ability to translate those findings to understanding human disease has been incomplete, and the majority of disease loci are not explained by association with expression of a target gene. Cell-type specificity and the presence of multiple independent causal variants for many eQTLs are potential confounders contributing to the apparent discrepancy with disease loci.

View Article and Find Full Text PDF

Dynamic and temporally specific gene regulatory changes may underlie unexplained genetic associations with complex disease. During a dynamic process such as cellular differentiation, the overall cell type composition of a tissue (or an in vitro culture) and the gene regulatory profile of each cell can both experience significant changes over time. To identify these dynamic effects in high resolution, we collected single-cell RNA-sequencing data over a differentiation time course from induced pluripotent stem cells to cardiomyocytes, sampled at 7 unique time points in 19 human cell lines.

View Article and Find Full Text PDF
Article Synopsis
  • Long non-coding RNA (lncRNA) genes play critical roles in biological functions, but identifying which ones are linked to diseases is challenging due to their large number.
  • The study analyzed data from the GTEx project to investigate the expression and associations of over 14,000 lncRNA genes across 49 tissues and 101 complex traits.
  • They discovered 1,432 lncRNA gene-trait associations, with many linked to diseases such as inflammatory bowel disease and diabetes, indicating these lncRNAs can have significant effects not explained by nearby protein-coding genes.
View Article and Find Full Text PDF

Rare genetic variants are abundant across the human genome, and identifying their function and phenotypic impact is a major challenge. Measuring aberrant gene expression has aided in identifying functional, large-effect rare variants (RVs). Here, we expanded detection of genetically driven transcriptome abnormalities by analyzing gene expression, allele-specific expression, and alternative splicing from multitissue RNA-sequencing data, and demonstrate that each signal informs unique classes of RVs.

View Article and Find Full Text PDF

It is estimated that 350 million individuals worldwide suffer from rare diseases, which are predominantly caused by mutation in a single gene. The current molecular diagnostic rate is estimated at 50%, with whole-exome sequencing (WES) among the most successful approaches. For patients in whom WES is uninformative, RNA sequencing (RNA-seq) has shown diagnostic utility in specific tissues and diseases.

View Article and Find Full Text PDF

Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles, but no analogous code exists for non-coding variants.

View Article and Find Full Text PDF

Most variants implicated in common human disease by genome-wide association studies (GWAS) lie in noncoding sequence intervals. Despite the suggestion that regulatory element disruption represents a common theme, identifying causal risk variants within implicated genomic regions remains a major challenge. Here we present a new sequence-based computational method to predict the effect of regulatory variation, using a classifier (gkm-SVM) that encodes cell type-specific regulatory sequence vocabularies.

View Article and Find Full Text PDF