Background: Treating cancer depends in part on identifying the mutations driving each patient's disease. Many clinical laboratories are adopting high-throughput sequencing for assaying patients' tumours, applying targeted panels to formalin-fixed paraffin-embedded tumour tissues to detect clinically-relevant mutations. While there have been some benchmarking and best practices studies of this scenario, much variant calling work focuses on whole-genome or whole-exome studies, with fresh or fresh-frozen tissue.
View Article and Find Full Text PDFNucleic Acids Res
August 2018
Muscle-specific transcription factor MyoD orchestrates the myogenic gene expression program by binding to short DNA motifs called E-boxes within myogenic cis-regulatory elements (CREs). Genome-wide analyses of MyoD cistrome by chromatin immnunoprecipitation sequencing shows that MyoD-bound CREs contain multiple E-boxes of various sequences. However, how E-box numbers, sequences and their spatial arrangement within CREs collectively regulate the binding affinity and transcriptional activity of MyoD remain largely unknown.
View Article and Find Full Text PDFPolycomb repressive complex 2 (PRC2) accessory proteins play substoichiometric, tissue-specific roles to recruit PRC2 to specific genomic loci or increase enzymatic activity, while PRC2 core proteins are required for complex stability and global levels of trimethylation of histone 3 at lysine 27 (H3K27me3). Here, we demonstrate a role for the classical PRC2 accessory protein Mtf2/Pcl2 in the hematopoietic system that is more akin to that of a core PRC2 protein. erythroid progenitors demonstrate markedly decreased core PRC2 protein levels and a global loss of H3K27me3 at promoter-proximal regions.
View Article and Find Full Text PDFBackground: Unraveling transcriptional regulatory networks is a central problem in molecular biology and, in this quest, chromatin immunoprecipitation and sequencing (ChIP-seq) technology has given us the unprecedented ability to identify sites of protein-DNA binding and histone modification genome wide. However, multiple systemic and procedural biases hinder harnessing the full potential of this technology. Previous studies have addressed this problem, but a thorough characterization of different, interacting biases on ChIP-seq signals is still lacking.
View Article and Find Full Text PDFAlpha-solenoids are flexible protein structural domains formed by ensembles of alpha-helical repeats (Armadillo and HEAT repeats among others). While homology can be used to detect many of these repeats, some alpha-solenoids have very little sequence homology to proteins of known structure and we expect that many remain undetected. We previously developed a method for detection of alpha-helical repeats based on a neural network trained on a dataset of protein structures.
View Article and Find Full Text PDFChromatin immunoprecipitation coupled with ultra-high-throughput sequencing (ChIP-seq) is a widely used method for mapping the interactions of proteins with DNA. However, the requirements for ChIP-grade antibodies impede wider application of this method, and variations in results can be high owing to differences in affinity and cross-reactivity of antibodies. Therefore, we developed chromatin tandem affinity purification (ChTAP) as an effective alternative to ChIP.
View Article and Find Full Text PDFMotivation: Reliable estimation of the mean fragment length for next-generation short-read sequencing data is an important step in next-generation sequencing analysis pipelines, most notably because of its impact on the accuracy of the enriched regions identified by peak-calling algorithms. Although many peak-calling algorithms include a fragment-length estimation subroutine, the problem has not been adequately solved, as demonstrated by the variability of the estimates returned by different algorithms.
Results: In this article, we investigate the use of strand cross-correlation to estimate mean fragment length of single-end data and show that traditional estimation approaches have mixed reliability.
BioData Min
September 2012
Background: Reviewer and editor selection for peer review is getting harder for authors and publishers due to the specialization onto narrower areas of research carried by the progressive growth of the body of knowledge. Examination of the literature facilitates finding appropriate reviewers but is time consuming and complicated by author name ambiguities.
Results: We have developed a method called peer2ref to support authors and editors in selecting suitable reviewers for scientific manuscripts.
Pax3 and Pax7 regulate stem cell function in skeletal myogenesis. However, molecular insight into their distinct roles has remained elusive. Using gene expression data combined with genome-wide binding-site analysis, we show that both Pax3 and Pax7 bind identical DNA motifs and jointly activate a large panel of genes involved in muscle stem cell function.
View Article and Find Full Text PDFBackground: In spite of extensive research on the effect of mutation and selection on codon usage, a general model of codon usage bias due to mutational bias has been lacking. Because most amino acids allow synonymous GC content changing substitutions in the third codon position, the overall GC bias of a genome or genomic region is highly correlated with GC3, a measure of third position GC content. For individual amino acids as well, G/C ending codons usage generally increases with increasing GC bias and decreases with increasing AT bias.
View Article and Find Full Text PDFThe MEDLINE database of medical literature is routinely used by researchers and doctors to find articles pertaining to their area of interest. Insight into historical changes in research areas may be gained by chronological analysis of the 18 million records currently in the database, however such analysis is generally complex and time consuming. The authors' MLTrends web application graphs term usage in MEDLINE over time, allowing the determination of emergence dates for biomedical terms and historical variations in term usage intensity.
View Article and Find Full Text PDFSkeletal muscle ageing is characterized by faulty degenerative/regenerative processes that promote the decline of its mass, strength, and endurance. In this study, we used a transcriptional profiling method to better understand the molecular pathways and factors that contribute to these processes. To more appropriately contrast the differences in regenerative capacity of old muscle, we compared it with young muscle, where robust growth and efficient myogenic differentiation is ongoing.
View Article and Find Full Text PDFBackground: Currently one of the largest online repositories for human and mouse stem cell gene expression data, StemBase was first designed as a simple web-interface to DNA microarray data generated by the Canadian Stem Cell Network to facilitate the discovery of gene functions relevant to stem cell control and differentiation.
Findings: Since its creation, StemBase has grown in both size and scope into a system with analysis tools that examine either the whole database at once, or slices of data, based on tissue type, cell type or gene of interest. As of September 1, 2008, StemBase contains gene expression data (microarray and Serial Analysis of Gene Expression) from 210 stem cell samples in 60 different experiments.
A growing number of solved protein structures display an elongated structural domain, denoted here as alpha-rod, composed of stacked pairs of anti-parallel alpha-helices. Alpha-rods are flexible and expose a large surface, which makes them suitable for protein interaction. Although most likely originating by tandem duplication of a two-helix unit, their detection using sequence similarity between repeats is poor.
View Article and Find Full Text PDFStemBase is a database of gene expression data obtained from stem cells and derivatives mainly from mouse and human using DNA microarrays and Serial Analysis of Gene Expression. Here, we describe this database and indicate ways to use it for the study the expression of particular genes in stem cells or to search for genes with particular expression profiles in stem cells, which could be associated to stem cell function or used as stem cell markers.
View Article and Find Full Text PDF