TIS Transformer: remapping the human proteome using deep learning.

NAR Genom Bioinform

Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Oost-Vlaanderen 9000, Belgium.

Published: March 2023


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

The correct mapping of the proteome is an important step towards advancing our understanding of biological systems and cellular mechanisms. Methods that provide better mappings can fuel important processes such as drug discovery and disease understanding. Currently, true determination of translation initiation sites is primarily achieved by experiments. Here, we propose TIS Transformer, a deep learning model for the determination of translation start sites solely utilizing the information embedded in the transcript nucleotide sequence. The method is built upon deep learning techniques first designed for natural language processing. We prove this approach to be best suited for learning the semantics of translation, outperforming previous approaches by a large margin. We demonstrate that limitations in the model performance are primarily due to the presence of low-quality annotations against which the model is evaluated against. Advantages of the method are its ability to detect key features of the translation process and multiple coding sequences on a transcript. These include micropeptides encoded by short Open Reading Frames, either alongside a canonical coding sequence or within long non-coding RNAs. To demonstrate the use of our methods, we applied TIS Transformer to remap the full human proteome.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9985340PMC
http://dx.doi.org/10.1093/nargab/lqad021DOI Listing

Publication Analysis

Top Keywords

tis transformer
12
deep learning
12
human proteome
8
determination translation
8
transformer remapping
4
remapping human
4
proteome deep
4
learning
4
learning correct
4
correct mapping
4

Similar Publications

Accurate bacterial gene prediction is essential for understanding microbial functions and advancing biotechnology. Traditional methods based on sequence homology and statistical models often struggle with complex genetic variations and novel sequences due to their limited ability to interpret the "language of genes." To overcome these challenges, we explore genomic language models (gLMs)-inspired by large language models in natural language processing-to enhance bacterial gene prediction.

View Article and Find Full Text PDF

A Conversation with ChatGPT on Contentious Issues in Senescence and Cancer Research.

Mol Pharmacol

April 2024

Department of Pharmacology and Toxicology, School of Medicine, Virginia Commonwealth University, Richmond, Virginia (A.M.E., D.A.G.); Department of Pharmacology and Toxicology, Faculty of Pharmacy, Kafrelsheikh University, Kafrelsheikh, Egypt (A.M.E.); and Department of Pharmacology and Public Healt

Artificial intelligence (AI) platforms, such as Generative Pretrained Transformer (ChatGPT), have achieved a high degree of popularity within the scientific community due to their utility in providing evidence-based reviews of the literature. However, the accuracy and reliability of the information output and the ability to provide critical analysis of the literature, especially with respect to highly controversial issues, has generally not been evaluated. In this work, we arranged a question/answer session with ChatGPT regarding several unresolved questions in the field of cancer research relating to therapy-induced senescence (TIS), including the topics of senescence reversibility, its connection to tumor dormancy, and the pharmacology of the newly emerging drug class of senolytics.

View Article and Find Full Text PDF

TIS Transformer: remapping the human proteome using deep learning.

NAR Genom Bioinform

March 2023

Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Oost-Vlaanderen 9000, Belgium.

The correct mapping of the proteome is an important step towards advancing our understanding of biological systems and cellular mechanisms. Methods that provide better mappings can fuel important processes such as drug discovery and disease understanding. Currently, true determination of translation initiation sites is primarily achieved by experiments.

View Article and Find Full Text PDF

The Dutch Teratology Information Service Lareb counsels healthcare professionals and patients about medication use during pregnancy and lactation. To keep the evidence up to date, employees perform a standardized weekly PubMed query where relevant literature is identified manually. We aimed to develop an accurate machine-learning algorithm to predict the relevance of PubMed entries, thereby reducing the labor-intensive task of manually screening the articles.

View Article and Find Full Text PDF