Publications by authors named "Georgy Meshcheryakov"

High-throughput sequencing facilitates large-scale studies of gene regulation and allows tracing the associations of individual genomic variants with changes in gene regulation and expression. Compared to classic association studies, the assessment of an allelic imbalance at heterozygous variants captures functional variant effects with smaller sample sizes, higher sensitivity, and better resolution. Yet, identification of allele-specific variants from allelic read counts remains challenging due to data-dependent biases and overdispersion arising from technical and biological variability.

View Article and Find Full Text PDF

We describe an effort ("Codebook") to determine the sequence specificity of 332 putative and largely uncharacterized human transcription factors (TFs), as well as 61 control TFs. Nearly 5,000 independent experiments across multiple and assays produced motifs for just over half of the putative TFs analyzed (177, or 53%), of which most are unique to a single TF. The data highlight the extensive contribution of transposable elements to TF evolution, both in and , and identify tens of thousands of conserved, base-level binding sites in the human genome.

View Article and Find Full Text PDF
Article Synopsis
  • A systematic evaluation is necessary to understand how different model architectures and training strategies affect the performance of genomics models, prompting the organization of a DREAM Challenge.
  • In the challenge, competitors used a vast dataset of yeast DNA sequences and expression levels to train models, with the best models employing various neural network architectures and training approaches.
  • The development of the Prix Fixe framework allowed for an in-depth analysis of these models, leading to improved performance, and demonstrating that top models not only excelled on yeast data but also outperformed existing benchmarks in Drosophila and human datasets.
View Article and Find Full Text PDF

Neural networks have emerged as immensely powerful tools in predicting functional genomic regions, notably evidenced by recent successes in deciphering gene regulatory logic. However, a systematic evaluation of how model architectures and training strategies impact genomics model performance is lacking. To address this gap, we held a DREAM Challenge where competitors trained models on a dataset of millions of random promoter DNA sequences and corresponding expression levels, experimentally determined in yeast, to best capture the relationship between regulatory DNA and gene expression.

View Article and Find Full Text PDF

Motivation: The increasing volume of data from high-throughput experiments including parallel reporter assays facilitates the development of complex deep-learning approaches for modeling DNA regulatory grammar.

Results: Here, we introduce LegNet, an EfficientNetV2-inspired convolutional network for modeling short gene regulatory regions. By approaching the sequence-to-expression regression problem as a soft classification task, LegNet secured first place for the autosome.

View Article and Find Full Text PDF

A deeper knowledge of the dynamic transcriptional activity of promoters and enhancers is needed to improve mechanistic understanding of the pathogenesis of heart failure and heart diseases. In this study, we used cap analysis of gene expression (CAGE) to identify and quantify the activity of transcribed regulatory elements (TREs) in the four cardiac chambers of 21 healthy and ten failing adult human hearts. We identified 17,668 promoters and 14,920 enhancers associated with the expression of 14,519 genes.

View Article and Find Full Text PDF

C4 photosynthesis increases the efficiency of carbon fixation by spatially separating high concentrations of molecular oxygen from Rubisco. The specialized leaf anatomy required for this separation evolved independently many times. The morphology of C4 root systems is also distinctive and adapted to support high rates of photosynthesis; however, little is known about the molecular mechanisms that have driven the evolution of C4 root system architecture.

View Article and Find Full Text PDF

Background: There is a plethora of methods for genome-wide association studies. However, only a few of them may be classified as multi-trait and multi-locus, i.e.

View Article and Find Full Text PDF