Publications by authors named "Xiongwen Cao"

Human endogenous retroviruses (hERVs) are noninfectious molecular remnants of ancient exogenous retroviruses that now make up 8% of the human genome. The ubiquitously expressed human locus was recently annotated as encoding a 109-amino acid endogenous retroviral Rec microprotein. However, because this locus was thought to be noncoding until recently, it is currently unknown whether the ERVK3-1 microprotein has a function in human cells.

View Article and Find Full Text PDF

Ribosome profiling and mass spectrometry have revealed thousands of previously unannotated small and alternative open reading frames (sm/alt-ORFs) that are translated into micro/alt-proteins in mammalian cells. However, their prevalence across human tissues and biological roles remains largely undefined. The placenta is an ideal model for identifying unannotated microproteins and alt-proteins due to its considerable protein diversity that is required to sustain fetal development during pregnancy.

View Article and Find Full Text PDF

The conserved WD40-repeat protein WDR5 interacts with multiple proteins both inside and outside the nucleus. However, it is currently unclear whether and how the distribution of WDR5 between complexes is regulated. Here, we show that an unannotated microprotein EMBOW (endogenous microprotein binder of WDR5) dually encoded in the human SCRIB gene interacts with WDR5 and regulates its binding to multiple interaction partners, including KMT2A and KIF2A.

View Article and Find Full Text PDF
Article Synopsis
  • Thousands of unexplored small and alternative open reading frames (smORFs and alt-ORFs) exist in mammalian genomes, yet most remain uncharacterized in terms of their molecular functions.
  • * Many smORF- and alt-ORF-encoded proteins (SEPs and alt-proteins) are linked to cell proliferation, but they show little similarity to known proteins, making their biological roles hard to identify.
  • * New experimental techniques that combine chemical labeling and quantitative proteomics are enhancing our ability to discover and analyze these proteins, facilitating the understanding of their functions and interactions.
View Article and Find Full Text PDF

RIBO-seq and proteogenomics have revealed that mammalian genomes harbor thousands of unannotated small and alternative open reading frames (smORFs, <100 amino acids, and alt-ORFs, >100 amino acids, respectively). Several dozen mammalian smORF-encoded proteins (SEPs) and alt-ORF-encoded proteins (alt-proteins) have been shown to play important biological roles, while the overwhelming majority of smORFs and alt-ORFs remain uncharacterized, particularly at the molecular level. Functional proteomics has the potential to reveal key properties of unannotated SEPs and alt-proteins in high throughput, and an approach to identify SEPs and alt-proteins undergoing regulated synthesis should be of broad utility.

View Article and Find Full Text PDF

Proteogenomic identification of translated small open reading frames has revealed thousands of previously unannotated, largely uncharacterized microproteins, or polypeptides of less than 100 amino acids, and alternative proteins (alt-proteins) that are co-encoded with canonical proteins and are often larger. The subcellular localizations of microproteins and alt-proteins are generally unknown but can have significant implications for their functions. Proximity biotinylation is an attractive approach to define the protein composition of subcellular compartments in cells and in animals.

View Article and Find Full Text PDF

Many unannotated microproteins and alternative proteins (alt-proteins) are coencoded with canonical proteins, but few of their functions are known. Motivated by the hypothesis that alt-proteins undergoing regulated synthesis could play important cellular roles, we developed a chemoproteomic pipeline to identify nascent alt-proteins in human cells. We identified 22 actively translated alt-proteins or N-terminal extensions, one of which is post-transcriptionally upregulated by DNA damage stress.

View Article and Find Full Text PDF

Thousands of human small and alternative open reading frames (smORFs and alt-ORFs, respectively) have recently been annotated. Many alt-ORFs are co-encoded with canonical proteins in multicistronic configurations, but few of their functions are known. Here, we report the detection of alt-RPL36, a protein co-encoded with human RPL36.

View Article and Find Full Text PDF

Ribosome profiling and mass spectrometry have revealed thousands of small and alternative open reading frames (sm/alt-ORFs) that are translated into polypeptides variously termed as microproteins and alt-proteins in mammalian cells. Some micro-/alt-proteins exhibit stress-, cell-type-, and/or tissue-specific expression; understanding this regulated expression will be critical to elucidating their functions. While differential translation has been inferred by ribosome profiling, quantitative mass spectrometry-based proteomics is needed for direct comparison of microprotein and alt-protein expression between samples and conditions.

View Article and Find Full Text PDF

Histone methyl groups can be removed by demethylases. Although LSD1 and JmjC domain-containing proteins have been identified as histone demethylases, enzymes for many histone methylation states or sites are still unknown. Here, we perform a screening of a cDNA library containing 2,500 nuclear proteins and identify hHR23A as a histone H4K20 demethylase.

View Article and Find Full Text PDF

Recent ribosome profiling and proteomic studies have revealed the presence of thousands of novel coding sequences, referred to as small open reading frames (sORFs), in prokaryotic and eukaryotic genomes. These genes have defied discovery via traditional genomic tools not only because they tend to be shorter than standard gene annotation length cutoffs, but also because they are, as a class, enriched in sequence properties previously assumed to be unusual, including non-AUG start codons. In this review, we summarize what is currently known about the incidence, efficiency, and mechanism of non-AUG start codon usage in prokaryotes and eukaryotes, and provide examples of regulatory and functional sORFs that initiate at non-AUG codons.

View Article and Find Full Text PDF

Introduction: Histone H3 lysine 27 trimethylation (H3K27me3) and H3 lysine 36 trimethylation (H3K36me3) are important epigenetic modifications correlated with transcription repression and activation, respectively. These two opposing modifications rarely co-exist in the same H3 polypeptide. However, a small but significant amount of H3 tails are modified with 5 methyl groups on K27 and K36 in mouse embryonic stem cells (mESCs) and it is unclear how the trimethylation is distributed on K27 or K36.

View Article and Find Full Text PDF

Protein arginine methyltransferases (PRMTs) are a family of enzymes that can methylate protein arginine residues. PRMTs' substrates include histones and a variety of non-histone proteins. Previous studies have shown that yeast Hmt1 is a type I PRMT and methylates histone H4 arginine 3 and several mRNA-binding proteins.

View Article and Find Full Text PDF

Epigenetic modifications are thought to be important for gene expression changes during development and aging. However, besides the Sir2 histone deacetylase in somatic tissues and H3K4 trimethylation in germlines, there is scant evidence implicating epigenetic regulations in aging. The insulin/IGF-1 signaling (IIS) pathway is a major life span regulatory pathway.

View Article and Find Full Text PDF