Publications by authors named "Zachary Ardern"

Omics technologies have led to the discovery of a vast number of proteins that are expressed but have no functional annotation - so called hypothetical proteins (HPs). Even in the best-studied model organism K-12, over 2 % of the proteome remains uncharacterized. This knowledge gap becomes even worse when looking at microbial dark matter.

View Article and Find Full Text PDF

Protein-coding DNA sequences can be translated into completely different amino acid sequences if the nucleotide triplets used are shifted by a non-triplet amount on the same DNA strand or by translating codons from the opposite strand. Such "alternative reading frames" of protein-coding genes are a major contributor to the evolution of novel protein products. Recent studies demonstrating this include examples across the three domains of cellular life and in viruses.

View Article and Find Full Text PDF

Eukaryotic genomes are pervasively translated, but the properties of translated sequences outside of canonical genes are poorly understood. A new study in Cell Systems reveals a large translatome that is not under significant evolutionary constraint but is still an active part of diverse cellular systems.

View Article and Find Full Text PDF

Annotating protein sequences according to their biological functions is one of the key steps in understanding microbial diversity, metabolic potentials, and evolutionary histories. However, even in the best-studied prokaryotic genomes, not all proteins can be characterized by classical in vivo, in vitro, and/or in silico methods-a challenge rapidly growing alongside the advent of next-generation sequencing technologies and their enormous extension of 'omics' data in public databases. These so-called hypothetical proteins (HPs) represent a huge knowledge gap and hidden potential for biotechnological applications.

View Article and Find Full Text PDF

The existence of overlapping genes (OLGs) with significant coding overlaps revolutionizes our understanding of genomic complexity. We report two exceptionally long (957 nt and 1536 nt), evolutionarily novel, translated antisense open reading frames (ORFs) embedded within annotated genes in the pathogenic Gram-negative bacterium . Both OLG pairs show sequence features consistent with being genes and transcriptional signals in RNA sequencing.

View Article and Find Full Text PDF

Background: Overlapping genes (OLGs) with long protein-coding overlapping sequences are disallowed by standard genome annotation programs, outside of viruses. Recently however they have been discovered in Archaea, diverse Bacteria, and Mammals. The biological factors underlying life's ability to create overlapping genes require more study, and may have important applications in understanding evolution and in biotechnology.

View Article and Find Full Text PDF

At least six small alternative-frame open reading frames (ORFs) overlapping well-characterized SARS-CoV-2 genes have been hypothesized to encode accessory proteins. Researchers have used different names for the same ORF or the same name for different ORFs, resulting in erroneous homological and functional inferences. We propose standard names for these ORFs and their shorter isoforms, developed in consultation with the Coronaviridae Study Group of the International Committee on Taxonomy of Viruses.

View Article and Find Full Text PDF
Article Synopsis
  • Understanding the emergence of new viruses hinges on thoroughly annotating their genomes, particularly focusing on overlapping genes (OLGs) commonly found in viruses, like SARS-CoV-2.
  • Researchers identified a novel OLG in SARS-CoV-2 that appears in Guangxi pangolin-CoVs but not in other related viruses, and they analyzed its translation and protein sequence across different evolutionary contexts.
  • This OLG has been mistakenly classified, leading to confusion in research, but it has been shown to trigger a strong antibody response in COVID-19 patients, emphasizing the critical role of OLGs in viral evolution and pandemics.
View Article and Find Full Text PDF

Many prokaryotic RNAs are transcribed from loci outside of annotated protein coding genes. Across bacterial species hundreds of short open reading frames antisense to annotated genes show evidence of both transcription and translation, for instance in ribosome profiling data. Determining the functional fraction of these protein products awaits further research, including insights from studies of molecular interactions and detailed evolutionary analysis.

View Article and Find Full Text PDF

Ribosome profiling (RIBO-Seq) has improved our understanding of bacterial translation, including finding many unannotated genes. However, protocols for RIBO-Seq and corresponding data analysis are not yet standardized. Here, we analyzed 48 RIBO-Seq samples from nine studies of K12 grown in lysogeny broth medium and particularly focused on the size-selection step.

View Article and Find Full Text PDF

Antisense transcription is well known in bacteria. However, translation of antisense RNAs is typically not considered, as the implied overlapping coding at a DNA locus is assumed to be highly improbable. Therefore, such overlapping genes are systematically excluded in prokaryotic genome annotation.

View Article and Find Full Text PDF
Article Synopsis
  • Purifying natural selection helps identify functional biological sequences in protein-coding genes, using a measure called dN/dS (the ratio of nonsynonymous to synonymous substitutions).
  • Overlapping genes (OLGs) complicate this analysis since changes that are synonymous for one gene may not be for the other, making it necessary to develop new methods for evaluating these constraints.
  • The proposed tool, OLGenie, offers an enhanced method for identifying true OLGs with high accuracy and has been successfully tested on viral genomes, including a significant analysis of an HIV-1 gene, highlighting the potential for further studies in genome annotation.
View Article and Find Full Text PDF

The genetic code and its evolution have been studied by many different approaches. One approach is to compare the properties of the standard genetic code (SGC) to theoretical alternative codes in order to determine how optimal it is and from this infer whether or not it is likely that it has undergone a selective evolutionary process. Many different properties have been studied in this way in the literature.

View Article and Find Full Text PDF

Only a few overlapping gene pairs are known in the best-analyzed bacterial model organism Escherichia coli. Automatic annotation programs usually annotate only one out of six reading frames at a locus, allowing only small overlaps between protein-coding sequences. However, both RNAseq and RIBOseq show signals corresponding to non-trivially overlapping reading frames in antisense to annotated genes, which may constitute protein-coding genes.

View Article and Find Full Text PDF

In the past, short protein-coding genes were often disregarded by genome annotation pipelines. Transcriptome sequencing (RNAseq) signals outside of annotated genes have usually been interpreted to indicate either ncRNA or pervasive transcription. Therefore, in addition to the transcriptome, the translatome (RIBOseq) of the enteric pathogen Escherichia coli O157:H7 strain Sakai was determined at two optimal growth conditions and a severe stress condition combining low temperature and high osmotic pressure.

View Article and Find Full Text PDF