Proteome-wide copy-number estimation from transcriptomics.

bioRxiv

Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, 22908.

Published: July 2023


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Protein copy numbers constrain systems-level properties of regulatory networks, but absolute proteomic data remain scarce compared to transcriptomics obtained by RNA sequencing. We addressed this persistent gap by relating mRNA to protein statistically using best-available data from quantitative proteomics-transcriptomics for 4366 genes in 369 cell lines. The approach starts with a central estimate of protein copy number and hierarchically appends mRNA-protein and mRNA-mRNA dependencies to define an optimal gene-specific model that links mRNAs to protein. For dozens of independent cell lines and primary prostate samples, these protein inferences from mRNA outmatch stringent null models, a count-based protein-abundance repository, and empirical protein-to-mRNA ratios. The optimal mRNA-to-protein relationships capture biological processes along with hundreds of known protein-protein interaction complexes, suggesting mechanistic relationships are embedded. We use the method to estimate viral-receptor abundances of CD55-CXADR from human heart transcriptomes and build 1489 systems-biology models of coxsackievirus B3 infection susceptibility. When applied to 796 RNA sequencing profiles of breast cancer from The Cancer Genome Atlas, inferred copy-number estimates collectively reclassify 26% of Luminal A and 29% of Luminal B tumors. Protein-based reassignments strongly involve a pharmacologic target for luminal breast cancer (CDK4) and an α-catenin that is often undetectable at the mRNA level (CTTNA2). Thus, by adopting a gene-centered perspective of mRNA-protein covariation across different biological contexts, we achieve accuracies comparable to the technical reproducibility limits of contemporary proteomics. The collection of gene-specific models is assembled as a web tool for users seeking mRNA-guided predictions of absolute protein abundance (http://janeslab.shinyapps.io/Pinferna).

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10369941PMC
http://dx.doi.org/10.1101/2023.07.10.548432DOI Listing

Publication Analysis

Top Keywords

protein copy
8
rna sequencing
8
cell lines
8
breast cancer
8
protein
6
proteome-wide copy-number
4
copy-number estimation
4
estimation transcriptomics
4
transcriptomics protein
4
copy numbers
4

Similar Publications

Hayata 1916 is a unique bamboo species endemic to Taiwan, typically found at elevations ranging from 500 to 1,500 meters. This study provides a detailed analysis of the complete chloroplast genome of for the first time. The genome spans 139,664 base pairs (bp) and consists of a large single-copy (LSC) region of 83,192 bp, a small single-copy (SSC) region of 12,869 bp, and two inverted repeat (IR) regions, each 21,798 bp in length.

View Article and Find Full Text PDF

Tropomyosin is an actin-binding protein (ABP) which protects actin filaments from cofilin-mediated disassembly. Distinct tropomyosin isoforms have long been hypothesized to differentially sort to subcellular actin networks and impart distinct functionalities. Nevertheless, a mechanistic understanding of the interplay between Tpm isoforms and their functional contributions to actin dynamics has been lacking.

View Article and Find Full Text PDF

Escherichia coli strain O55 contains two cryptic plasmids that depend on each other to replicate.

Arch Microbiol

September 2025

División de Ciencias Naturales y Exactas, Departamento de Biología, Universidad de Guanajuato, Zip Code 36050, Guanajuato, Mexico.

Plasmids are fundamental to molecular biology and biotechnology, playing a crucial role in bacterial evolution. Some plasmids are linked to complex cellular dynamics, including pathogenicity islands, antibiotic resistance, and gene mobilization. This study reports the isolation and sequencing of two cryptic plasmids with different electrophoretic mobilities from the Escherichia coli clinical isolate O55.

View Article and Find Full Text PDF

Introduction: Spinal muscular atrophy (SMA), caused by pathogenic variants in the survival motor neuron (SMN) gene, is the most common genetic cause of mortality in children under the age of two. Prior reports of obstetric sonograms performed in pregnancies with severe forms of fetal SMA have discrepant findings that may stem from a failure to account for the SMN2 copy number.

Methods: We present a neonate diagnosed with SMA type 0 postnatally (0SMN1/1SMN2 genotype).

View Article and Find Full Text PDF

Copy number control of DNA and centrosomes is essential for accurate genetic inheritance. DNA replication and centrosome duplication have been recognized as parallel key events for cell division. Here, we discover that the DNA replication machinery directly regulates the licensing and execution processes of centrosome duplication to prevent centrosome amplification.

View Article and Find Full Text PDF