Bam-readcount is a utility for generating low-level information about sequencing data at specific nucleotide positions. Originally designed to help filter genomic mutation calls, the metrics it outputs are useful as input for variant detection tools and for resolving ambiguity between variant callers. In addition, it has found broad applicability in diverse fields including tumor evolution, single-cell genomics, climate change ecology, and tracking community spread of SARS-CoV-2.
View Article and Find Full Text PDFThe contribution of genome structural variation (SV) to quantitative traits associated with cardiometabolic diseases remains largely unknown. Here, we present the results of a study examining genetic association between SVs and cardiometabolic traits in the Finnish population. We used sensitive methods to identify and genotype 129,166 high-confidence SVs from deep whole-genome sequencing (WGS) data of 4,848 individuals.
View Article and Find Full Text PDFTumor heterogeneity and evolution drive treatment resistance in metastatic colorectal cancer (mCRC). Patient-derived xenografts (PDXs) can model mCRC biology; however, their ability to accurately mimic human tumor heterogeneity is unclear. Current genomic studies in mCRC have limited scope and lack matched PDXs.
View Article and Find Full Text PDFA key goal of whole-genome sequencing for studies of human genetics is to interrogate all forms of variation, including single-nucleotide variants, small insertion or deletion (indel) variants and structural variants. However, tools and resources for the study of structural variants have lagged behind those for smaller variants. Here we used a scalable pipeline to map and characterize structural variants in 17,795 deeply sequenced human genomes.
View Article and Find Full Text PDFAn Amendment to this paper has been published and can be accessed via a link at the top of the paper.
View Article and Find Full Text PDFExome-sequencing studies have generally been underpowered to identify deleterious alleles with a large effect on complex traits as such alleles are mostly rare. Because the population of northern and eastern Finland has expanded considerably and in isolation following a series of bottlenecks, individuals of these populations have numerous deleterious alleles at a relatively high frequency. Here, using exome sequencing of nearly 20,000 individuals from these regions, we investigate the role of rare coding variants in clinically relevant quantitative cardiometabolic traits.
View Article and Find Full Text PDFSummary: Large-scale human genetics studies are now employing whole genome sequencing with the goal of conducting comprehensive trait mapping analyses of all forms of genome variation. However, methods for structural variation (SV) analysis have lagged far behind those for smaller scale variants, and there is an urgent need to develop more efficient tools that scale to the size of human populations. Here, we present a fast and highly scalable software toolkit (svtools) and cloud-based pipeline for assembling high quality SV maps-including deletions, duplications, mobile element insertions, inversions and other rearrangements-in many thousands of human genomes.
View Article and Find Full Text PDFThe original version of this Article contained errors in the depiction of confidence intervals in the NF1 BCSS data illustrated in Figure 3b. These have now been corrected in both the PDF and HTML versions of the Article. The incorrect version of Figure 3b is presented in the associated Author Correction.
View Article and Find Full Text PDFHundreds of thousands of human whole genome sequencing (WGS) datasets will be generated over the next few years. These data are more valuable in aggregate: joint analysis of genomes from many sources increases sample size and statistical power. A central challenge for joint analysis is that different WGS data processing pipelines cause substantial differences in variant calling in combined datasets, necessitating computationally expensive reprocessing.
View Article and Find Full Text PDFNat Commun
September 2018
Here we report targeted sequencing of 83 genes using DNA from primary breast cancer samples from 625 postmenopausal (UBC-TAM series) and 328 premenopausal (MA12 trial) hormone receptor-positive (HR+) patients to determine interactions between somatic mutation and prognosis. Independent validation of prognostic interactions was achieved using data from the METABRIC study. Previously established associations between MAP3K1 and PIK3CA mutations with luminal A status/favorable prognosis and TP53 mutations with Luminal B/non-luminal tumors/poor prognosis were observed, validating the methodological approach.
View Article and Find Full Text PDFTo detect diverse and novel RNA species comprehensively, we compared deep small RNA and RNA sequencing (RNA-seq) methods applied to a primary acute myeloid leukemia (AML) sample. We were able to discover previously unannotated small RNAs using deep sequencing of a library method using broader insert size selection. We analyzed the long noncoding RNA (lncRNA) landscape in AML by comparing deep sequencing from multiple RNA-seq library construction methods for the sample that we studied and then integrating RNA-seq data from 179 AML cases.
View Article and Find Full Text PDFCIViC is an expert-crowdsourced knowledgebase for Clinical Interpretation of Variants in Cancer describing the therapeutic, prognostic, diagnostic and predisposing relevance of inherited and somatic variants of all types. CIViC is committed to open-source code, open-access content, public application programming interfaces (APIs) and provenance of supporting evidence to allow for the transparent creation of current and accurate variant interpretations for use in cancer precision medicine.
View Article and Find Full Text PDFCancer Epidemiol Biomarkers Prev
November 2016
Background: Common variants have been associated with prostate cancer risk. Unfortunately, few are reproducibly linked to aggressive disease, the phenotype of greatest clinical relevance. One possible explanation is that rare genetic variants underlie a significant proportion of the risk for aggressive disease.
View Article and Find Full Text PDFObjective: In many rheumatoid arthritis (RA) patients, disease is controlled with anti-tumor necrosis factor (anti-TNF) biologic therapies. However, in a significant number of patients, the disease fails to respond to anti-TNF therapy. We undertook the present study to examine the hypothesis that rare and low-frequency genetic variants might influence response to anti-TNF treatment.
View Article and Find Full Text PDFEstrogen receptor alpha-positive (ERα+) luminal tumors are the most frequent subtype of breast cancer. Stat1(-/-) mice develop mammary tumors that closely recapitulate the biological characteristics of this cancer subtype. To identify transforming events that contribute to tumorigenesis, we performed whole genome sequencing of Stat1(-/-) primary mammary tumors and matched normal tissues.
View Article and Find Full Text PDFResistance to oestrogen-deprivation therapy is common in oestrogen-receptor-positive (ER+) breast cancer. To better understand the contributions of tumour heterogeneity and evolution to resistance, here we perform comprehensive genomic characterization of 22 primary tumours sampled before and after 4 months of neoadjuvant aromatase inhibitor (NAI) treatment. Comparing whole-genome sequencing of tumour/normal pairs from the two time points, with coincident tumour RNA sequencing, reveals widespread spatial and temporal heterogeneity, with marked remodelling of the clonal landscape in response to NAI.
View Article and Find Full Text PDFThe genomic events responsible for the pathogenesis of relapsed adult B-lymphoblastic leukemia (B-ALL) are not yet clear. We performed integrative analysis of whole-genome, whole-exome, custom capture, whole-transcriptome (RNA-seq), and locus-specific genomic assays across nine time points from a patient with primary de novo B-ALL. Comprehensive genome and transcriptome characterization revealed a dramatic tumor evolution during progression, yielding a tumor with complex clonal architecture at second relapse.
View Article and Find Full Text PDFLarge-scale cancer sequencing data enable discovery of rare germline cancer susceptibility variants. Here we systematically analyse 4,034 cases from The Cancer Genome Atlas cancer cases representing 12 cancer types. We find that the frequency of rare germline truncations in 114 cancer-susceptibility-associated genes varies widely, from 4% (acute myeloid leukaemia (AML)) to 19% (ovarian cancer), with a notably high frequency of 11% in stomach cancer.
View Article and Find Full Text PDFCell Syst
September 2015
Tumors are typically sequenced to depths of 75-100× (exome) or 30-50× (whole genome). We demonstrate that current sequencing paradigms are inadequate for tumors that are impure, aneuploid or clonally heterogeneous. To reassess optimal sequencing strategies, we performed ultra-deep (up to ~312×) whole genome sequencing (WGS) and exome capture (up to ~433×) of a primary acute myeloid leukemia, its subsequent relapse, and a matched normal skin sample.
View Article and Find Full Text PDFImportance: Tests that predict outcomes for patients with acute myeloid leukemia (AML) are imprecise, especially for those with intermediate risk AML.
Objectives: To determine whether genomic approaches can provide novel prognostic information for adult patients with de novo AML.
Design, Setting, And Participants: Whole-genome or exome sequencing was performed on samples obtained at disease presentation from 71 patients with AML (mean age, 50.
PLoS Comput Biol
July 2015
In this work, we present the Genome Modeling System (GMS), an analysis information management system capable of executing automated genome analysis pipelines at a massive scale. The GMS framework provides detailed tracking of samples and data coupled with reliable and repeatable analysis pipelines. The GMS also serves as a platform for bioinformatics development, allowing a large team to collaborate on data analysis, or an individual researcher to leverage the work of others effectively within its data management system.
View Article and Find Full Text PDFDespite the success of genome-wide association studies (GWAS) in detecting a large number of loci for complex phenotypes such as rheumatoid arthritis (RA) susceptibility, the lack of information on the causal genes leaves important challenges to interpret GWAS results in the context of the disease biology. Here, we genetically fine-map the RA risk locus at 19p13 to define causal variants, and explore the pleiotropic effects of these same variants in other complex traits. First, we combined Immunochip dense genotyping (n = 23,092 case/control samples), Exomechip genotyping (n = 18,409 case/control samples) and targeted exon-sequencing (n = 2,236 case/controls samples) to demonstrate that three protein-coding variants in TYK2 (tyrosine kinase 2) independently protect against RA: P1104A (rs34536443, OR = 0.
View Article and Find Full Text PDFAlthough genome-wide association studies (GWASs) for nonsyndromic orofacial clefts have identified multiple strongly associated regions, the causal variants are unknown. To address this, we selected 13 regions from GWASs and other studies, performed targeted sequencing in 1,409 Asian and European trios, and carried out a series of statistical and functional analyses. Within a cluster of strongly associated common variants near NOG, we found that one, rs227727, disrupts enhancer activity.
View Article and Find Full Text PDFCurr Protoc Bioinformatics
December 2013
The identification of small sequence variants remains a challenging but critical step in the analysis of next-generation sequencing data. Our variant calling tool, VarScan 2, employs heuristic and statistic thresholds based on user-defined criteria to call variants using SAMtools mpileup data as input. Here, we provide guidelines for generating that input, and describe protocols for using VarScan 2 to (1) identify germline variants in individual samples; (2) call somatic mutations, copy number alterations, and LOH events in tumor-normal pairs; and (3) identify germline variants, de novo mutations, and Mendelian inheritance errors in family trios.
View Article and Find Full Text PDFCurr Protoc Bioinformatics
July 2016
Detecting somatic single nucleotide variants (SNVs) is an essential component of cancer research with next generation sequencing data. This protocol describes how to run the SomaticSniper somatic SNV detector and then filter the output to eliminate most false positives. It also includes support protocols detailing the compilation of the software.
View Article and Find Full Text PDF