Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

A DNA sequence pattern, or "motif", is an essential representation of DNA-binding specificity of a transcription factor (TF). Any particular motif model has potential flaws due to shortcomings of the underlying experimental data and computational motif discovery algorithm. As a part of the Codebook/GRECO-BIT initiative, here we evaluated at large scale the cross-platform recognition performance of positional weight matrices (PWMs), which remain popular motif models in many practical applications. We applied ten different DNA motif discovery tools to generate PWMs from the "Codebook" data comprised of 4,237 experiments from five different platforms profiling the DNA-binding specificity of 394 human proteins, focusing on understudied transcription factors of different structural families. For many of the proteins, there was no prior knowledge of a genuine motif. By benchmarking-supported human curation, we constructed an approved subset of experiments comprising about 30% of all experiments and 50% of tested TFs which displayed consistent motifs across platforms and replicates. We present the Codebook Motif Explorer (https://mex.autosome.org), a detailed online catalog of DNA motifs, including the top-ranked PWMs, and the underlying source and benchmarking data. We demonstrate that in the case of high-quality experimental data, most of the popular motif discovery tools detect valid motifs and generate PWMs, which perform well both on genomic and synthetic data. Yet, for each of the algorithms, there were problematic combinations of proteins and platforms, and the basic motif properties such as nucleotide composition and information content offered little help in detecting such pitfalls. By combining multiple PMWs in decision trees, we demonstrate how our setup can be readily adapted to train and test binding specificity models more complex than PWMs. Overall, our study provides a rich motif catalog as a solid baseline for advanced models and highlights the power of the multi-platform multi-tool approach for reliable mapping of DNA binding specificities.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601219PMC
http://dx.doi.org/10.1101/2024.11.11.619379DOI Listing

Publication Analysis

Top Keywords

motif discovery
16
motif
10
dna motif
8
binding specificities
8
transcription factors
8
dna-binding specificity
8
experimental data
8
popular motif
8
discovery tools
8
generate pwms
8

Similar Publications

Machine Learning-Aided Screening and Design Rule Discovery for LWIR-Transparent Optical Materials.

J Chem Inf Model

September 2025

Department of Chemistry and Biochemistry, University of Arizona, Tucson, Arizona 85721-0041, United States.

The development of low-cost, high-performance materials with enhanced transparency in the long-wavelength infrared (LWIR) region (800-1250 cm/8-12.5 μm) is essential for advancing thermal imaging and sensing technologies. Traditional LWIR optics rely on costly inorganic materials, limiting their broader deployment.

View Article and Find Full Text PDF

A clinical and genotype-phenotype analysis of MACF1 variants.

Am J Hum Genet

September 2025

Department of Clinical Genetics, Erasmus MC, University Medical Center Rotterdam, PO Box 2040, Rotterdam 3000 CA, the Netherlands.

Microtubule-actin cross-linking factor 1 (MACF1) is a large protein of the spectraplakin family, which is essential for brain development. MACF1 interacts with microtubules through the growth arrest-specific 2 (Gas2)-related (GAR) domain. Heterozygous MACF1 missense variants affecting the zinc-binding residues in this domain result in a distinctive cortical and brain stem malformation.

View Article and Find Full Text PDF

TissueMosaic: Self-supervised learning of tissue representations enables differential spatial transcriptomics across samples.

Cell Syst

September 2025

Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. Electronic address:

Spatial transcriptomics allows for the measurement of gene expression within the native tissue context. However, despite technological advancements, computational methods to link cell states with their microenvironment and compare these relationships across samples and conditions remain limited. To address this, we introduce Tissue Motif-Based Spatial Inference across Conditions (TissueMosaic), a self-supervised convolutional neural network designed to discover and represent tissue architectural motifs from multi-sample spatial transcriptomic datasets.

View Article and Find Full Text PDF

Discovery and phylogeny of a ricin-B-like domain from rice.

Carbohydr Res

September 2025

Laboratory for Biochemistry & Glycobiology, Ghent University, Department of Biotechnology, Ghent, Belgium. Electronic address:

Lectins are carbohydrate-binding proteins which play key roles in various biological processes, including cell signaling, pathogen recognition and development. Previous research conducted on ricin-B lectin domains and carbohydrate-binding modules of family 13 (CBM13) illustrated the striking resemblances between these two groups of protein domains. In this study, we report on the discovery, identification and putative biochemical characteristics of a ricin-B-like domain that is unique for GH27 enzymes from land plants, identified in the OsAPSE enzyme from Japanese rice (Oryza sativa L.

View Article and Find Full Text PDF

The cytosolic iron-sulfur cluster assembly (CIA) targeting complex maturates over 30 cytosolic and nuclear Fe-S proteins, raising the question of how a single complex recognizes such a diverse set of clients. The discovery of a C-terminal targeting complex recognition (TCR) peptide in up to 25% of CIA clients provided a clue to substrate specificity, yet the molecular and energetic basis for this interaction remained unresolved. By integrating computational and biochemical approaches, we show that the TCR peptide binds a conserved interface between the Cia1 and Cia2 subunits of the targeting complex, even in the absence of the Fe-S cluster.

View Article and Find Full Text PDF