The diversity of cellular and tissue structures can arise from a few basic cell shapes, which undergo various transformations based on biophysical constraints on cytoskeletal organization. While cellular geometry has been linked with selected biological processes such as polarity, signaling or morphogenesis, the orchestration of the whole proteome in association to cell shape is still poorly understood. In this study, using more than 1 million images of single cells stained for 11,998 proteins across 10 cell lines in the Human Protein Atlas database, we performed an integrated analysis of organelle, pathway and single protein levels in association to a 2D cellular shapespace.
View Article and Find Full Text PDFHuman cells consist of a complex hierarchy of components, many of which remain unexplored. Here we construct a global map of human subcellular architecture through joint measurement of biophysical interactions and immunofluorescence images for over 5,100 proteins in U2OS osteosarcoma cells. Self-supervised multimodal data integration resolves 275 molecular assemblies spanning the range of 10 to 10 m, which we validate systematically using whole-cell size-exclusion chromatography and annotate using large language models.
View Article and Find Full Text PDFAnnu Rev Biomed Data Sci
August 2024
While the primary sequences of human proteins have been cataloged for over a decade, determining how these are organized into a dynamic collection of multiprotein assemblies, with structures and functions spanning biological scales, is an ongoing venture. Systematic and data-driven analyses of these higher-order structures are emerging, facilitating the discovery and understanding of cellular phenotypes. At present, knowledge of protein localization and function has been primarily derived from manual annotation and curation in resources such as the Gene Ontology, which are biased toward richly annotated genes in the literature.
View Article and Find Full Text PDFThe spatial organization of molecules in a cell is essential for their functions. While current methods focus on discerning tissue architecture, cell-cell interactions, and spatial expression patterns, they are limited to the multicellular scale. We present Bento, a Python toolkit that takes advantage of single-molecule information to enable spatial analysis at the subcellular scale.
View Article and Find Full Text PDFIt is important for the proteomics community to have a standardized manner to represent all possible variations of a protein or peptide primary sequence, including natural, chemically induced, and artifactual modifications. The Human Proteome Organization Proteomics Standards Initiative in collaboration with several members of the Consortium for Top-Down Proteomics (CTDP) has developed a standard notation called ProForma 2.0, which is a substantial extension of the original ProForma notation developed by the CTDP.
View Article and Find Full Text PDFHuman biology is tightly linked to proteins, yet most measurements do not precisely determine alternatively spliced sequences or posttranslational modifications. Here, we present the primary structures of ~30,000 unique proteoforms, nearly 10 times more than in previous studies, expressed from 1690 human genes across 21 cell types and plasma from human blood and bone marrow. The results, compiled in the Blood Proteoform Atlas (BPA), indicate that proteoforms better describe protein-level biology and are more specific indicators of differentiation than their corresponding proteins, which are more broadly expressed across cell types.
View Article and Find Full Text PDFInterpreting proteomics data remains challenging due to the large number of proteins that are quantified by modern mass spectrometry methods. Weighted gene correlation network analysis (WGCNA) can identify groups of biologically related proteins using only protein intensity values by constructing protein correlation networks. However, WGCNA is not widespread in proteomic analyses due to challenges in implementing workflows.
View Article and Find Full Text PDFThe cell cycle, over which cells grow and divide, is a fundamental process of life. Its dysregulation has devastating consequences, including cancer. The cell cycle is driven by precise regulation of proteins in time and space, which creates variability between individual proliferating cells.
View Article and Find Full Text PDFCellular heterogeneity is an important biological phenomenon observed across space and time in human tissues. Imaging-based spatial proteomic technologies can provide fruitful new readouts of phenotypic states for individual cells at subcellular resolution, which may help unravel the roles of non-genetic cellular heterogeneity in tumorigenesis and drug resistance.
View Article and Find Full Text PDFProteoforms are the workhorses of the cell, and subtle differences between their amino acid sequences or post-translational modifications (PTMs) can change their biological function. To most effectively identify and quantify proteoforms in genetically diverse samples by mass spectrometry (MS), it is advantageous to search the MS data against a sample-specific protein database that is tailored to the sample being analyzed, in that it contains the correct amino acid sequences and relevant PTMs for that sample. To this end, we have developed Spritz (https://smith-chem-wisc.
View Article and Find Full Text PDFNat Methods
September 2020
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
View Article and Find Full Text PDFThe nucleolus is essential for ribosome biogenesis and is involved in many other cellular functions. We performed a systematic spatiotemporal dissection of the human nucleolar proteome using confocal microscopy. In total, 1,318 nucleolar proteins were identified; 287 were localized to fibrillar components, and 157 were enriched along the nucleoplasmic border, indicating a potential fourth nucleolar subcompartment: the nucleoli rim.
View Article and Find Full Text PDFIdentifying single amino acid variants (SAAVs) in cancer is critical for precision oncology. Several advanced algorithms are now available to identify SAAVs, but attempts to combine different algorithms and optimize them on large data sets to achieve a more comprehensive coverage of SAAVs have not been implemented. Herein, we report an expanded detection of SAAVs in the PANC-1 cell line using three different strategies, which results in the identification of 540 SAAVs in the mass spectrometry data.
View Article and Find Full Text PDFAn amendment to this paper has been published and can be accessed via a link at the top of the paper.
View Article and Find Full Text PDFAn amendment to this paper has been published and can be accessed via a link at the top of the paper.
View Article and Find Full Text PDFPinpointing subcellular protein localizations from microscopy images is easy to the trained eye, but challenging to automate. Based on the Human Protein Atlas image collection, we held a competition to identify deep learning solutions to solve this task. Challenges included training on highly imbalanced classes and predicting multiple labels per image.
View Article and Find Full Text PDFProteins bind mRNA through their entire life cycle from transcription to degradation. We analyzed c-Myc mRNA protein interactors in vivo using the HyPR-MS method to capture the crosslinked mRNA by hybridization and then analyzed the bound proteins using mass spectrometry proteomics. Using HyPR-MS, 229 c-Myc mRNA-binding proteins were identified, confirming previously proposed interactors, suggesting new interactors, and providing information related to the roles and pathways known to involve c-Myc.
View Article and Find Full Text PDFThe development of effective strategies for the comprehensive identification and quantification of proteoforms in complex systems is a critical challenge in proteomics. Proteoforms, the specific molecular forms in which proteins are present in biological systems, are the key effectors of biological function. Thus, knowledge of proteoform identities and abundances is essential to unraveling the mechanisms that underlie protein function.
View Article and Find Full Text PDFJ Proteome Res
September 2018
RNA-protein interactions are integral to the regulation of gene expression. RNAs have diverse functions and the protein interactomes of individual RNAs vary temporally, spatially, and with physiological context. These factors make the global acquisition of individual RNA-protein interactomes an essential endeavor.
View Article and Find Full Text PDFIntroduction: The molecular mechanisms underlying aggressive versus indolent disease are not fully understood. Recent research has implicated a class of molecules known as long noncoding RNAs (lncRNAs) in tumorigenesis and progression of cancer. Our objective was to discover lncRNAs that differentiate aggressive and indolent prostate cancers.
View Article and Find Full Text PDFThe Consortium for Top-Down Proteomics (CTDP) proposes a standardized notation, ProForma, for writing the sequence of fully characterized proteoforms. ProForma provides a means to communicate any proteoform by writing the amino acid sequence using standard one-letter notation and specifying modifications or unidentified mass shifts within brackets following certain amino acids. The notation is unambiguous, human-readable, and can easily be parsed and written by bioinformatic tools.
View Article and Find Full Text PDFIn top-down proteomics, intact proteins are analyzed by tandem mass spectrometry and proteoforms, which are defined forms of a protein with specific sequences of amino acids and localized post-translational modifications, are identified using precursor mass and fragmentation data. Many proteoforms that are detected in the precursor scan (MS1) are not selected for fragmentation by the instrument and therefore remain unidentified in typical top-down proteomic workflows. Our laboratory has developed the open source software program Proteoform Suite to analyze MS1-only intact proteoform data.
View Article and Find Full Text PDFWe present an open-source, interactive program named Proteoform Suite that uses proteoform mass and intensity measurements from complex biological samples to identify and quantify proteoforms. It constructs families of proteoforms derived from the same gene, assesses proteoform function using gene ontology (GO) analysis, and enables visualization of quantified proteoform families and their changes. It is applied here to reveal systemic proteoform variations in the yeast response to salt stress.
View Article and Find Full Text PDFBackground: Shotgun proteomics utilizes a database search strategy to compare detected mass spectra to a library of theoretical spectra derived from reference genome information. As such, the robustness of proteomics results is contingent upon the completeness and accuracy of the gene annotation in the reference genome. For animal models of disease where genomic annotation is incomplete, such as non-human primates, proteogenomic methods can improve the detection of proteins by incorporating transcriptional data from RNA-Seq to improve proteomics search databases used for peptide spectral matching.
View Article and Find Full Text PDF