High-throughput sequencing facilitates large-scale studies of gene regulation and allows tracing the associations of individual genomic variants with changes in gene regulation and expression. Compared to classic association studies, the assessment of an allelic imbalance at heterozygous variants captures functional variant effects with smaller sample sizes, higher sensitivity, and better resolution. Yet, identification of allele-specific variants from allelic read counts remains challenging due to data-dependent biases and overdispersion arising from technical and biological variability.
View Article and Find Full Text PDFWe describe an effort ("Codebook") to determine the sequence specificity of 332 putative and largely uncharacterized human transcription factors (TFs), as well as 61 control TFs. Nearly 5,000 independent experiments across multiple and assays produced motifs for just over half of the putative TFs analyzed (177, or 53%), of which most are unique to a single TF. The data highlight the extensive contribution of transposable elements to TF evolution, both in and , and identify tens of thousands of conserved, base-level binding sites in the human genome.
View Article and Find Full Text PDFNat Biotechnol
August 2025
Neural networks have emerged as immensely powerful tools in predicting functional genomic regions, notably evidenced by recent successes in deciphering gene regulatory logic. However, a systematic evaluation of how model architectures and training strategies impact genomics model performance is lacking. To address this gap, we held a DREAM Challenge where competitors trained models on a dataset of millions of random promoter DNA sequences and corresponding expression levels, experimentally determined in yeast, to best capture the relationship between regulatory DNA and gene expression.
View Article and Find Full Text PDFBioinformatics
August 2023
Motivation: The increasing volume of data from high-throughput experiments including parallel reporter assays facilitates the development of complex deep-learning approaches for modeling DNA regulatory grammar.
Results: Here, we introduce LegNet, an EfficientNetV2-inspired convolutional network for modeling short gene regulatory regions. By approaching the sequence-to-expression regression problem as a soft classification task, LegNet secured first place for the autosome.
A deeper knowledge of the dynamic transcriptional activity of promoters and enhancers is needed to improve mechanistic understanding of the pathogenesis of heart failure and heart diseases. In this study, we used cap analysis of gene expression (CAGE) to identify and quantify the activity of transcribed regulatory elements (TREs) in the four cardiac chambers of 21 healthy and ten failing adult human hearts. We identified 17,668 promoters and 14,920 enhancers associated with the expression of 14,519 genes.
View Article and Find Full Text PDFC4 photosynthesis increases the efficiency of carbon fixation by spatially separating high concentrations of molecular oxygen from Rubisco. The specialized leaf anatomy required for this separation evolved independently many times. The morphology of C4 root systems is also distinctive and adapted to support high rates of photosynthesis; however, little is known about the molecular mechanisms that have driven the evolution of C4 root system architecture.
View Article and Find Full Text PDFBackground: There is a plethora of methods for genome-wide association studies. However, only a few of them may be classified as multi-trait and multi-locus, i.e.
View Article and Find Full Text PDF