Overlapping genes produce proteins with unusual sequence properties and offer insight into de novo protein creation.

J Virol

Architecture et Fonction des Macromolécules Biologiques, Case 932, Campus de Luminy, 13288 Marseille Cedex 9, France.

Published: October 2009


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

It is widely assumed that new proteins are created by duplication, fusion, or fission of existing coding sequences. Another mechanism of protein birth is provided by overlapping genes. They are created de novo by mutations within a coding sequence that lead to the expression of a novel protein in another reading frame, a process called "overprinting." To investigate this mechanism, we have analyzed the sequences of the protein products of manually curated overlapping genes from 43 genera of unspliced RNA viruses infecting eukaryotes. Overlapping proteins have a sequence composition globally biased toward disorder-promoting amino acids and are predicted to contain significantly more structural disorder than nonoverlapping proteins. By analyzing the phylogenetic distribution of overlapping proteins, we were able to confirm that 17 of these had been created de novo and to study them individually. Most proteins created de novo are orphans (i.e., restricted to one species or genus). Almost all are accessory proteins that play a role in viral pathogenicity or spread, rather than proteins central to viral replication or structure. Most proteins created de novo are predicted to be fully disordered and have a highly unusual sequence composition. This suggests that some viral overlapping reading frames encoding hypothetical proteins with highly biased composition, often discarded as noncoding, might in fact encode proteins. Some proteins created de novo are predicted to be ordered, however, and whenever a three-dimensional structure of such a protein has been solved, it corresponds to a fold previously unobserved, suggesting that the study of these proteins could enhance our knowledge of protein space.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2753099PMC
http://dx.doi.org/10.1128/JVI.00595-09DOI Listing

Publication Analysis

Top Keywords

created novo
20
proteins created
16
proteins
13
overlapping genes
12
unusual sequence
8
overlapping proteins
8
sequence composition
8
novo predicted
8
overlapping
6
novo
6

Similar Publications

SPACE: STRING proteins as complementary embeddings.

Bioinformatics

September 2025

Novo Nordisk Foundation Center for Protein Research, Department of Cellular and Molecular Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, 2200, Denmark.

Motivation: Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but the use of protein networks has proven to be challenging in the context of machine learning, especially in a cross-species setting.

Results: We leveraged the STRING database of protein networks and orthology relations for 1,322 eukaryotes to generate network-based cross-species protein embeddings.

View Article and Find Full Text PDF

Aims/hypothesis: Severe hypoglycaemia events (SHE) remain frequent in people with type 1 diabetes despite advanced diabetes technologies. We examined whether time below range (TBR) 3.9 mmol/l (70 mg/dl; TBR70) or 3.

View Article and Find Full Text PDF

Going beyond SMILES enumeration for data augmentation in generative drug discovery.

Digit Discov

August 2025

Institute for Complex Molecular Systems (ICMS), Eindhoven AI Systems Institute (EAISI), Department of Biomedical Engineering, Eindhoven University of Technology Eindhoven The Netherlands

Data augmentation can alleviate the limitations of small molecular datasets for generative deep learning by 'artificially inflating' the number of instances available for training. SMILES enumeration - wherein multiple valid SMILES strings are used to represent the same molecules - has become particularly beneficial to improve the quality of molecule design. Herein, we investigated whether rethinking SMILES augmentation techniques could further enhance the quality of design.

View Article and Find Full Text PDF

Disruptions in circadian rhythm, partly controlled by the hormone melatonin, increase the risk of type 2 diabetes (T2D). Accordingly, a variant of the gene encoding the melatonin receptor 1B (MTNR1B) is robustly associated with increased risk of T2D. This single-nucleotide polymorphism (SNP; rs10830963; G-allele) is an expression quantitative trait locus (eQTL) in human pancreatic islets, conferring increased expression of MTNR1B, which is thought to perturb pancreatic β-cell function.

View Article and Find Full Text PDF

Impact upfront: novel format for Novo Nordisk Foundation funding.

Health Res Policy Syst

September 2025

Health Economics Research Group, Department of Health Sciences, Brunel University of London, London, United Kingdom.

Many retrospective assessments of the wider, societal impacts from health research funding use the Payback Framework or other frameworks. Much of this experience was collated in the 2018 Statement by the International School on Research Impact Assessment (ISRIA). Despite increased interest, especially in engaged research and a wider range of evaluation approaches, rarely do health and other research funders take a prospective approach and analyse the potential impact from a proposal to inform an impact management approach aimed at boosting impact.

View Article and Find Full Text PDF