Deep learning is a promising strategy for modeling cis-regulatory elements. However, models trained on genomic sequences often fail to explain why the same transcription factor can activate or repress transcription in different contexts. To address this limitation, we developed an active learning approach to train models that distinguish between enhancers and silencers composed of binding sites for the photoreceptor transcription factor cone-rod homeobox (CRX).
View Article and Find Full Text PDFBackground: Individual cells from isogenic populations often display large cell-to-cell differences in gene expression. This "noise" in expression derives from several sources, including the genomic and cellular environment in which a gene resides. Large-scale maps of genomic environments have revealed the effects of epigenetic modifications and transcription factor occupancy on mean expression levels, but leveraging such maps to explain expression noise will require new methods to assay how expression noise changes at locations across the genome.
View Article and Find Full Text PDF-regulatory elements (CREs) direct gene expression in health and disease, and models that can accurately predict their activities from DNA sequences are crucial for biomedicine. Deep learning represents one emerging strategy to model the regulatory grammar that relates CRE sequence to function. However, these models require training data on a scale that exceeds the number of CREs in the genome.
View Article and Find Full Text PDFStochastic differences among clonal cells can initiate cell fate decisions in development or cause cell-to-cell differences in the responses to drugs or extracellular ligands. One hypothesis is that some of this phenotypic variability is caused by stochastic fluctuations in the activities of transcription factors (TFs). We tested this hypothesis in NIH3T3-CG cells using the response to Hedgehog signaling as a model cellular response.
View Article and Find Full Text PDFPostzygotic mutations (PZMs) begin to accrue in the human genome immediately after fertilization, but how and when PZMs affect development and lifetime health remain unclear. To study the origins and functional consequences of PZMs, we generated a multitissue atlas of PZMs spanning 54 tissue and cell types from 948 donors. Nearly half the variation in mutation burden among tissue samples can be explained by measured technical and biological effects, and 9% can be attributed to donor-specific effects.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
February 2022
Pathogenic variants in surfactant proteins SP-B and SP-C cause surfactant deficiency and interstitial lung disease. Surfactant proteins are synthesized as precursors (proSP-B, proSP-C), trafficked, and processed via a vesicular-regulated secretion pathway; however, control of vesicular trafficking events is not fully understood. Through the Undiagnosed Diseases Network, we evaluated a child with interstitial lung disease suggestive of surfactant deficiency.
View Article and Find Full Text PDFNat Commun
November 2020
Miscarriage is a common, complex trait affecting ~15% of clinically confirmed pregnancies. Here we present the results of large-scale genetic association analyses with 69,054 cases from five different ancestries for sporadic miscarriage, 750 cases of European ancestry for multiple (≥3) consecutive miscarriage, and up to 359,469 female controls. We identify one genome-wide significant association (rs146350366, minor allele frequency (MAF) 1.
View Article and Find Full Text PDFAdenosine-to-inosine (A-to-I) RNA editing is a conserved post-transcriptional mechanism mediated by ADAR enzymes that diversifies the transcriptome by altering selected nucleotides in RNA molecules. Although many editing sites have recently been discovered, the extent to which most sites are edited and how the editing is regulated in different biological contexts are not fully understood. Here we report dynamic spatiotemporal patterns and new regulators of RNA editing, discovered through an extensive profiling of A-to-I RNA editing in 8,551 human samples (representing 53 body sites from 552 individuals) from the Genotype-Tissue Expression (GTEx) project and in hundreds of other primate and mouse samples.
View Article and Find Full Text PDFCIViC is an expert-crowdsourced knowledgebase for Clinical Interpretation of Variants in Cancer describing the therapeutic, prognostic, diagnostic and predisposing relevance of inherited and somatic variants of all types. CIViC is committed to open-source code, open-access content, public application programming interfaces (APIs) and provenance of supporting evidence to allow for the transparent creation of current and accurate variant interpretations for use in cancer precision medicine.
View Article and Find Full Text PDFThe genomic events responsible for the pathogenesis of relapsed adult B-lymphoblastic leukemia (B-ALL) are not yet clear. We performed integrative analysis of whole-genome, whole-exome, custom capture, whole-transcriptome (RNA-seq), and locus-specific genomic assays across nine time points from a patient with primary de novo B-ALL. Comprehensive genome and transcriptome characterization revealed a dramatic tumor evolution during progression, yielding a tumor with complex clonal architecture at second relapse.
View Article and Find Full Text PDFCell Syst
September 2015
Tumors are typically sequenced to depths of 75-100× (exome) or 30-50× (whole genome). We demonstrate that current sequencing paradigms are inadequate for tumors that are impure, aneuploid or clonally heterogeneous. To reassess optimal sequencing strategies, we performed ultra-deep (up to ~312×) whole genome sequencing (WGS) and exome capture (up to ~433×) of a primary acute myeloid leukemia, its subsequent relapse, and a matched normal skin sample.
View Article and Find Full Text PDFPurpose: This trial was conducted to determine the maximum tolerated dose (MTD) and preliminary efficacy of buparlisib, an oral pan-class I PI3K inhibitor, plus fulvestrant in postmenopausal women with metastatic estrogen receptor positive (ER(+)) breast cancer.
Experimental Design: Phase IA employed a 3+3 design to determine the MTD of buparlisib daily plus fulvestrant. Subsequent cohorts (phase IB and cohort C) evaluated intermittent (5/7-day) and continuous dosing of buparlisib (100 mg daily).
In this work, we present the Genome Modeling System (GMS), an analysis information management system capable of executing automated genome analysis pipelines at a massive scale. The GMS framework provides detailed tracking of samples and data coupled with reliable and repeatable analysis pipelines. The GMS also serves as a platform for bioinformatics development, allowing a large team to collaborate on data analysis, or an individual researcher to leverage the work of others effectively within its data management system.
View Article and Find Full Text PDFBroad and deep tumour genome sequencing has shed new light on tumour heterogeneity and provided important insights into the evolution of metastases arising from different clones. There is an additional layer of complexity, in that tumour evolution may be influenced by selective pressure provided by therapy, in a similar fashion to that occurring in infectious diseases. Here we studied tumour genomic evolution in a patient (index patient) with metastatic breast cancer bearing an activating PIK3CA (phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit alpha, PI(3)Kα) mutation.
View Article and Find Full Text PDFWe present DeNovoGear software for analyzing de novo mutations from familial and somatic tissue sequencing data. DeNovoGear uses likelihood-based error modeling to reduce the false positive rate of mutation discovery in exome analysis and fragment information to identify the parental origin of germ-line mutations. We used DeNovoGear on human whole-genome sequencing data to produce a set of predicted de novo insertion and/or deletion (indel) mutations with a 95% validation rate.
View Article and Find Full Text PDFGonadal failure, along with early pregnancy loss and perinatal death, may be an important filter that limits the propagation of harmful mutations in the human population. We hypothesized that men with spermatogenic impairment, a disease with unknown genetic architecture and a common cause of male infertility, are enriched for rare deleterious mutations compared to men with normal spermatogenesis. After assaying genomewide SNPs and CNVs in 323 Caucasian men with idiopathic spermatogenic impairment and more than 1,100 controls, we estimate that each rare autosomal deletion detected in our study multiplicatively changes a man's risk of disease by 10% (OR 1.
View Article and Find Full Text PDFBMC Bioinformatics
October 2012
Background: We consider the problem of finding the maximum frequent agreement subtrees (MFASTs) in a collection of phylogenetic trees. Existing methods for this problem often do not scale beyond datasets with around 100 taxa. Our goal is to address this problem for datasets with over a thousand taxa and hundreds of trees.
View Article and Find Full Text PDF