Hopanoids are bacterial lipids that fortify membranes and enhance stress resistance. Derivatives of hopanoids known as "geo-hopanes" are abundant signatures of ancient bacteria in sediments, yet there are conflicting views on whether they are markers of specific taxa or environments. Here we analyze conservation of hopanoid biosynthesis across bacterial genomes using modern taxonomic tools.
View Article and Find Full Text PDFThree-dimensional genome organization orchestrates recombination and transcription of immunoglobulin heavy chain (Igh) genes. The structure of wild-type (WT) alleles includes a prominent architectural stripe that extends from a cluster of CTCF binding elements at the 3' end of the locus (3'CBE), suggesting interactions of this end with sequences throughout the 2 Mb Igh TAD. Here we elucidate interplay between regulatory elements located in the 3'Igh domain (260 kb) that impact the stripe.
View Article and Find Full Text PDFNat Cell Biol
June 2024
Mammalian genomes are organized into three-dimensional DNA structures called A/B compartments that are associated with transcriptional activity/inactivity. However, whether these structures are simply correlated with gene expression or are permissive/impermissible to transcription has remained largely unknown because we lack methods to measure DNA organization and transcription simultaneously. Recently, we developed RNA & DNA (RD)-SPRITE, which enables genome-wide measurements of the spatial organization of RNA and DNA.
View Article and Find Full Text PDFA fundamental question in gene regulation is how cell-type-specific gene expression is influenced by the packaging of DNA within the nucleus of each cell. We recently developed Split-Pool Recognition of Interactions by Tag Extension (SPRITE), which enables mapping of higher-order interactions within the nucleus. SPRITE works by cross-linking interacting DNA, RNA and protein molecules and then mapping DNA-DNA spatial arrangements through an iterative split-and-pool barcoding method.
View Article and Find Full Text PDFRNA, DNA, and protein molecules are highly organized within three-dimensional (3D) structures in the nucleus. Although RNA has been proposed to play a role in nuclear organization, exploring this has been challenging because existing methods cannot measure higher-order RNA and DNA contacts within 3D structures. To address this, we developed RNA & DNA SPRITE (RD-SPRITE) to comprehensively map the spatial organization of RNA and DNA.
View Article and Find Full Text PDFMolecular switch proteins whose cycling between states is controlled by opposing regulators are central to biological signal transduction. As switch proteins function within highly connected interaction networks, the fundamental question arises of how functional specificity is achieved when different processes share common regulators. Here we show that functional specificity of the small GTPase switch protein Gsp1 in Saccharomyces cerevisiae (the homologue of the human protein RAN) is linked to differential sensitivity of biological processes to different kinetics of the Gsp1 (RAN) switch cycle.
View Article and Find Full Text PDFNat Biotechnol
January 2022
Although three-dimensional (3D) genome organization is central to many aspects of nuclear function, it has been difficult to measure at the single-cell level. To address this, we developed 'single-cell split-pool recognition of interactions by tag extension' (scSPRITE). scSPRITE uses split-and-pool barcoding to tag DNA fragments in the same nucleus and their 3D spatial arrangement.
View Article and Find Full Text PDFIdentifying the relationships between chromosome structures, nuclear bodies, chromatin states and gene expression is an overarching goal of nuclear-organization studies. Because individual cells appear to be highly variable at all these levels, it is essential to map different modalities in the same cells. Here we report the imaging of 3,660 chromosomal loci in single mouse embryonic stem (ES) cells using DNA seqFISH+, along with 17 chromatin marks and subnuclear structures by sequential immunofluorescence and the expression profile of 70 RNAs.
View Article and Find Full Text PDFSevere acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a recently identified coronavirus that causes the respiratory disease known as coronavirus disease 2019 (COVID-19). Despite the urgent need, we still do not fully understand the molecular basis of SARS-CoV-2 pathogenesis. Here, we comprehensively define the interactions between SARS-CoV-2 proteins and human RNAs.
View Article and Find Full Text PDFThe Rosetta software for macromolecular modeling, docking and design is extensively used in laboratories worldwide. During two decades of development by a community of laboratories at more than 60 institutions, Rosetta has been continuously refactored and extended. Its advantages are its performance and interoperability between broad modeling capabilities.
View Article and Find Full Text PDFSensing and responding to signals is a fundamental ability of living systems, but despite substantial progress in the computational design of new protein structures, there is no general approach for engineering arbitrary new protein sensors. Here, we describe a generalizable computational strategy for designing sensor-actuator proteins by building binding sites de novo into heterodimeric protein-protein interfaces and coupling ligand sensing to modular actuation through split reporters. Using this approach, we designed protein sensors that respond to farnesyl pyrophosphate, a metabolic intermediate in the production of valuable compounds.
View Article and Find Full Text PDFEukaryotic genomes are packaged into a 3-dimensional structure in the nucleus. Current methods for studying genome-wide structure are based on proximity ligation. However, this approach can fail to detect known structures, such as interactions with nuclear bodies, because these DNA regions can be too far apart to directly ligate.
View Article and Find Full Text PDFMethods Mol Biol
February 2018
Protein-protein interactions play critical roles in essentially every cellular process. These interactions are often mediated by protein interaction domains that enable proteins to recognize their interaction partners, often by binding to short peptide motifs. For example, PDZ domains, which are among the most common protein interaction domains in the human proteome, recognize specific linear peptide sequences that are often at the C-terminus of other proteins.
View Article and Find Full Text PDFNat Rev Mol Cell Biol
December 2016
Over the past decade, it has become clear that mammalian genomes encode thousands of long non-coding RNAs (lncRNAs), many of which are now implicated in diverse biological processes. Recent work studying the molecular mechanisms of several key examples - including Xist, which orchestrates X chromosome inactivation - has provided new insights into how lncRNAs can control cellular functions by acting in the nucleus. Here we discuss emerging mechanistic insights into how lncRNAs can regulate gene expression by coordinating regulatory proteins, localizing to target loci and shaping three-dimensional (3D) nuclear organization.
View Article and Find Full Text PDFInteractions between small molecules and proteins play critical roles in regulating and facilitating diverse biological functions, yet our ability to accurately re-engineer the specificity of these interactions using computational approaches has been limited. One main difficulty, in addition to inaccuracies in energy functions, is the exquisite sensitivity of protein-ligand interactions to subtle conformational changes, coupled with the computational problem of sampling the large conformational search space of degrees of freedom of ligands, amino acid side chains, and the protein backbone. Here, we describe two benchmarks for evaluating the accuracy of computational approaches for re-engineering protein-ligand interactions: (i) prediction of enzyme specificity altering mutations and (ii) prediction of sequence tolerance in ligand binding sites.
View Article and Find Full Text PDFThe development and validation of computational macromolecular modeling and design methods depend on suitable benchmark datasets and informative metrics for comparing protocols. In addition, if a method is intended to be adopted broadly in diverse biological applications, there needs to be information on appropriate parameters for each protocol, as well as metrics describing the expected accuracy compared to experimental data. In certain disciplines, there exist established benchmarks and public resources where experts in a particular methodology are encouraged to supply their most efficient implementation of each particular benchmark.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
October 2014
Reengineering protein-protein recognition is an important route to dissecting and controlling complex interaction networks. Experimental approaches have used the strategy of "second-site suppressors," where a functional interaction is inferred between two proteins if a mutation in one protein can be compensated by a mutation in the second. Mimicking this strategy, computational design has been applied successfully to change protein recognition specificity by predicting such sets of compensatory mutations in protein-protein interfaces.
View Article and Find Full Text PDFComputational protein design attempts to create protein sequences that fold stably into pre-specified structures. Here we compare alignments of designed proteins to alignments of natural proteins and assess how closely designed sequences recapitulate patterns of sequence variation found in natural protein sequences. We design proteins using RosettaDesign, and we evaluate both fixed-backbone designs and variable-backbone designs with different amounts of backbone flexibility.
View Article and Find Full Text PDFPLoS Comput Biol
July 2014
Amino acid covariation, where the identities of amino acids at different sequence positions are correlated, is a hallmark of naturally occurring proteins. This covariation can arise from multiple factors, including selective pressures for maintaining protein structure, requirements imposed by a specific function, or from phylogenetic sampling bias. Here we employed flexible backbone computational protein design to quantify the extent to which protein structure has constrained amino acid covariation for 40 diverse protein domains.
View Article and Find Full Text PDFMethods Enzymol
August 2013
Sampling alternative conformations is key to understanding how proteins work and engineering them for new functions. However, accurately characterizing and modeling protein conformational ensembles remain experimentally and computationally challenging. These challenges must be met before protein conformational heterogeneity can be exploited in protein engineering and design.
View Article and Find Full Text PDFAberrant protein aggregation is a hallmark of many age-related diseases, yet little is known about whether proteins aggregate with age in a non-disease setting. Using a systematic proteomics approach, we identified several hundred proteins that become more insoluble with age in the multicellular organism Caenorhabditis elegans. These proteins are predicted to be significantly enriched in beta-sheets, which promote disease protein aggregation.
View Article and Find Full Text PDF