Exploiting orthology and de novo transcriptome assembly to refine target sequence information.

BMC Med Genomics

Computational Biology & Genomics, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Strasse 65, 88397, Biberach an der Riss, Germany.

Published: May 2019


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: The ability to generate recombinant drug target proteins is important for drug discovery research as it facilitates the investigation of drug-target-interactions in vitro. To accomplish this, the target's exact protein sequence is required. Public databases, such as Ensembl, UniProt and RefSeq, are extensive protein and nucleotide sequence repositories. However, many sequences for non-human organisms are predicted by computational pipelines and may thus be incomplete or incorrect. This could lead to misinterpreted experimental outcomes due to gaps or errors in orthologous drug target sequences. Transcriptome analysis by RNA-Seq has been established as a standard method for gene expression analysis. Apart from this common application, paired-end RNA-Seq data can also be used to obtain full coverage cDNA sequences via de novo transcriptome assembly.

Methods: To assess whether de novo transcriptome assemblies can be used to determine a protein's sequence by searching the assembly for a known orthologous sequence, we generated 3 × 6 = 18 tissue specific assemblies (three organs: brain, kidney and liver; six species: human, mouse, rat, dog, pig and cynomolgus monkey). These assemblies and the manually curated human protein sequences from UniProtKB/Swiss-Prot were used in a reciprocal BLAST search to identify best matching hits. We automated and generalised our approach and present the a&o-tool, a workflow which exploits de novo assemblies of paired-end RNA-Seq data and orthology information for target sequence validation and refinement across related species. Furthermore, the a&o-tool extracts best hits' sequences from a reciprocal BLAST search, translates them into protein sequences, computes a multiple sequence alignment and quantifies the refinement.

Results: For the three human assemblies we observed a hit rate greater than 60% with 100% sequence coverage and identity. For assemblies from the other species we observed similar hit rates and coverage with highest identities for cynomolgus monkey.

Conclusions: In summary, we show how to refine protein sequences using RNA-Seq data and sequence information from closely related species. With the a&o-tool we provide a fully automated pipeline to perform refinement including cDNA translation and multiple sequence alignment for visual inspection. The major prerequisite for applying the a&o-tool is high quality sequencing data.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6533699PMC
http://dx.doi.org/10.1186/s12920-019-0524-5DOI Listing

Publication Analysis

Top Keywords

novo transcriptome
12
rna-seq data
12
protein sequences
12
sequence
10
target sequence
8
drug target
8
paired-end rna-seq
8
reciprocal blast
8
blast search
8
species a&o-tool
8

Similar Publications

Enhancer RNAs (eRNAs) are transcribed by during enhancer activation but are typically rapidly degraded in the nucleus. During states of reduced RNA surveillance, however, eRNAs and other similar "noncoding" RNAs (including, e.g.

View Article and Find Full Text PDF

Despite advancements in genome annotation tools, challenges persist for non-classical model organisms with limited genomic resources, such as Schmidtea mediterranea. To address these challenges, we developed a flexible and scalable genome annotation pipeline that integrates short-read (Illumina) and long-read (PacBio) sequencing technologies. The pipeline combines reference-based and de novo assembly methods, effectively handling genomic variability and alternative splicing events.

View Article and Find Full Text PDF

Enhanced detection of RNA modifications in Escherichia coli utilizing direct RNA sequencing.

Cell Rep Methods

September 2025

Department of Infectious Diseases and Public Health, Jockey Club College of Veterinary Medicine and Life Sciences, City University of Hong Kong, Hong Kong, China; Shenzhen Research Institute, City University of Hong Kong, Shenzhen, Guangdong 518057, China; Tung Biomedical Sciences Centre, City Unive

RNA modifications play crucial roles in prokaryotic cellular processes. In this study, we found that the recent advances in direct RNA sequencing have improved yield, accuracy, and signal-to-noise ratio in bacterial samples. By evaluating four current RNA modification calling models in Escherichia coli transcriptome using native and in vitro transcribed (IVT) RNA, we found the models identified most known rRNA modifications but produced false positives.

View Article and Find Full Text PDF

Purpose: To assess modified folinic acid/leucovorin, fluorouracil, irinotecan, oxaliplatin (FOLFIRINOX; mFFX) versus gemcitabine/nab-paclitaxel (GnP) in de novo metastatic pancreatic ductal adenocarcinoma (PDAC) and explore predictive biomarkers.

Patients And Methods: Patients were randomly assigned 1:1 to mFFX or GnP with exclusion of germline pathogenic variants in or . The primary end point was progression-free survival (PFS) between arms with 0.

View Article and Find Full Text PDF

Glioblastoma (GBM) is a lethal brain tumor with limited therapeutic options. Temozolomide (TMZ), a standard-of-care chemotherapeutic agent, exerts its cytotoxicity by alkylating DNA, which triggers a DNA damage response and depletes ATP and NAD. However, TMZ also releases the byproduct 4-amino-5-imidazole carboxamide (AIC), which is believed to be a benign metabolite.

View Article and Find Full Text PDF