98%
921
2 minutes
20
A DNA sequence pattern, or "motif", is an essential representation of DNA-binding specificity of a transcription factor (TF). Any particular motif model has potential flaws due to shortcomings of the underlying experimental data and computational motif discovery algorithm. As a part of the Codebook/GRECO-BIT initiative, here we evaluated at large scale the cross-platform recognition performance of positional weight matrices (PWMs), which remain popular motif models in many practical applications. We applied ten different DNA motif discovery tools to generate PWMs from the "Codebook" data comprised of 4,237 experiments from five different platforms profiling the DNA-binding specificity of 394 human proteins, focusing on understudied transcription factors of different structural families. For many of the proteins, there was no prior knowledge of a genuine motif. By benchmarking-supported human curation, we constructed an approved subset of experiments comprising about 30% of all experiments and 50% of tested TFs which displayed consistent motifs across platforms and replicates. We present the Codebook Motif Explorer (https://mex.autosome.org), a detailed online catalog of DNA motifs, including the top-ranked PWMs, and the underlying source and benchmarking data. We demonstrate that in the case of high-quality experimental data, most of the popular motif discovery tools detect valid motifs and generate PWMs, which perform well both on genomic and synthetic data. Yet, for each of the algorithms, there were problematic combinations of proteins and platforms, and the basic motif properties such as nucleotide composition and information content offered little help in detecting such pitfalls. By combining multiple PMWs in decision trees, we demonstrate how our setup can be readily adapted to train and test binding specificity models more complex than PWMs. Overall, our study provides a rich motif catalog as a solid baseline for advanced models and highlights the power of the multi-platform multi-tool approach for reliable mapping of DNA binding specificities.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601219 | PMC |
http://dx.doi.org/10.1101/2024.11.11.619379 | DOI Listing |
J Chem Inf Model
September 2025
Department of Chemistry and Biochemistry, University of Arizona, Tucson, Arizona 85721-0041, United States.
The development of low-cost, high-performance materials with enhanced transparency in the long-wavelength infrared (LWIR) region (800-1250 cm/8-12.5 μm) is essential for advancing thermal imaging and sensing technologies. Traditional LWIR optics rely on costly inorganic materials, limiting their broader deployment.
View Article and Find Full Text PDFAm J Hum Genet
September 2025
Department of Clinical Genetics, Erasmus MC, University Medical Center Rotterdam, PO Box 2040, Rotterdam 3000 CA, the Netherlands.
Microtubule-actin cross-linking factor 1 (MACF1) is a large protein of the spectraplakin family, which is essential for brain development. MACF1 interacts with microtubules through the growth arrest-specific 2 (Gas2)-related (GAR) domain. Heterozygous MACF1 missense variants affecting the zinc-binding residues in this domain result in a distinctive cortical and brain stem malformation.
View Article and Find Full Text PDFCell Syst
September 2025
Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. Electronic address:
Spatial transcriptomics allows for the measurement of gene expression within the native tissue context. However, despite technological advancements, computational methods to link cell states with their microenvironment and compare these relationships across samples and conditions remain limited. To address this, we introduce Tissue Motif-Based Spatial Inference across Conditions (TissueMosaic), a self-supervised convolutional neural network designed to discover and represent tissue architectural motifs from multi-sample spatial transcriptomic datasets.
View Article and Find Full Text PDFCarbohydr Res
September 2025
Laboratory for Biochemistry & Glycobiology, Ghent University, Department of Biotechnology, Ghent, Belgium. Electronic address:
Lectins are carbohydrate-binding proteins which play key roles in various biological processes, including cell signaling, pathogen recognition and development. Previous research conducted on ricin-B lectin domains and carbohydrate-binding modules of family 13 (CBM13) illustrated the striking resemblances between these two groups of protein domains. In this study, we report on the discovery, identification and putative biochemical characteristics of a ricin-B-like domain that is unique for GH27 enzymes from land plants, identified in the OsAPSE enzyme from Japanese rice (Oryza sativa L.
View Article and Find Full Text PDFJ Am Chem Soc
September 2025
Department of Chemistry, Boston University, 590 Commonwealth Ave, Boston, Massachusetts 02215, United States.
The cytosolic iron-sulfur cluster assembly (CIA) targeting complex maturates over 30 cytosolic and nuclear Fe-S proteins, raising the question of how a single complex recognizes such a diverse set of clients. The discovery of a C-terminal targeting complex recognition (TCR) peptide in up to 25% of CIA clients provided a clue to substrate specificity, yet the molecular and energetic basis for this interaction remained unresolved. By integrating computational and biochemical approaches, we show that the TCR peptide binds a conserved interface between the Cia1 and Cia2 subunits of the targeting complex, even in the absence of the Fe-S cluster.
View Article and Find Full Text PDF