Cracking the genetic code with neural networks.

Marc Joiret , Marine Leclercq , Gaspard Lambrechts , Francesca Rapino , Pierre Close , Gilles Louppe , Liesbet Geris

Front Artif Intell

Biomechanics Research Unit, GIGA in Silico Medicine, Liège University, Liège, Belgium.

Published: April 2023

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

The genetic code is textbook scientific knowledge that was soundly established without resorting to Artificial Intelligence (AI). The goal of our study was to check whether a neural network could re-discover, on its own, the mapping links between codons and amino acids and build the complete deciphering dictionary upon presentation of transcripts proteins data training pairs. We compared different Deep Learning neural network architectures and estimated quantitatively the size of the required human transcriptomic training set to achieve the best possible accuracy in the codon-to-amino-acid mapping. We also investigated the effect of a codon embedding layer assessing the semantic similarity between codons on the rate of increase of the training accuracy. We further investigated the benefit of quantifying and using the unbalanced representations of amino acids within real human proteins for a faster deciphering of rare amino acids codons. Deep neural networks require huge amount of data to train them. Deciphering the genetic code by a neural network is no exception. A test accuracy of 100% and the unequivocal deciphering of rare codons such as the tryptophan codon or the stop codons require a training dataset of the order of 4-22 millions cumulated pairs of codons with their associated amino acids presented to the neural network over around 7-40 training epochs, depending on the architecture and settings. We confirm that the wide generic capacities and modularity of deep neural networks allow them to be customized easily to learn the deciphering task of the genetic code efficiently.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10117997	PMC
http://dx.doi.org/10.3389/frai.2023.1128153	DOI Listing

Publication Analysis

Top Keywords

genetic code

neural network

amino acids

neural networks

code neural

deciphering rare

deep neural

neural

codons

deciphering

Similar Publications

Genomic and morphological characterization of a novel iridovirus, bivalve iridovirus 1 (BiIV1), infecting the common cockle ().

Microb Genom

September 2025

International Centre of Excellence for Aquatic Animal Health, The Centre for Environment, Fisheries and Aquaculture Science, Weymouth, DT4 8UB, UK.

Chantelle Hooper , Anna M Tidy , Ron Jessop , Kelly S Bateman , Matthew J Green

High rates of mortality of the common cockle, , have occurred in the Wash Estuary, UK, since 2008. A previous study linked the mortalities to a novel genotype of , with a strong correlation between cockle moribundity and the presence of . Here, we characterize a novel iridovirus, identified by chance during metagenomic sequencing of a gradient purification of cells, with the presence also correlated to cockle moribundity.

View Article and Find Full Text PDF

Similar Publications

Biosynthesis of Unnatural Cyclodipeptides through Genetic Code Expansion and Cyclodipeptide Synthase Evolution.

J Am Chem Soc

September 2025

Department of Chemistry, Rice University, 6100 Main Street, Houston, Texas 77005, United States.

Yu Hu , Linqi Cheng , Yijie Liu , Rui Liu , Shiyu Jason Jiang

Genetic code expansion (GCE) technology has primarily been devoted to the introduction of noncanonical amino acids (ncAAs) into ribosomally synthesized proteins or peptides. Its potential for modifying nonribosomal natural products remains unexplored. In this study, we introduce a novel strategy that integrates GCE with the directed evolution of cyclodipeptide synthase (CDPS) to engineer a new class of CDPSs capable of biosynthesizing cyclodipeptides containing ncAAs.

View Article and Find Full Text PDF

Similar Publications

Statins in Acute Ischemic Stroke: Mechanisms, Resistance, and Precision Strategies for Neurovascular and Cognitive Protection.

CNS Drugs

September 2025

Global Health Neurology Lab, Sydney, NSW, 2150, Australia.

Muskaan Gupta , Ivica Smokovski , Dimitrios G Chatzis , Kevin J Spring , Man Mohan Mehndiratta

Acute ischemic stroke (AIS) remains a leading cause of mortality and long-term disability globally, with survivors at high risk of recurrent stroke, cardiovascular events, and post-stroke dementia. Statins, while widely used for their lipid-lowering effects, also possess pleiotropic properties, including anti-inflammatory, endothelial-stabilizing, and neuroprotective actions, which may offer added benefit in AIS management. This article synthesizes emerging evidence on statins' dual mechanisms of action and evaluates their role in reducing recurrence, improving survival, and mitigating cognitive decline.

View Article and Find Full Text PDF

Similar Publications

A foundation model for learning genetic associations from brain imaging phenotypes.

Bioinform Adv

August 2025

IBM Research, Yorktown Heights, NY, 10598, United States.

Diego Machado Reyes , Myson Burch , Laxmi Parida , Aritra Bose

Motivation: Due to the intricate etiology of neurological disorders, finding interpretable associations between multiomics features can be challenging using standard approaches.

Results: We propose COMICAL, a contrastive learning approach using multiomics data to generate associations between genetic markers and brain imaging-derived phenotypes. COMICAL jointly learns omics representations utilizing transformer-based encoders with custom tokenizers.

View Article and Find Full Text PDF

Similar Publications

Association analysis between polygenic risk scores and traits: practical guidelines and tutorial with an illustrative data set of schizophrenia.

Front Psychiatry

August 2025

Statistics Section of the Department of Genetics, Microbiology and Statistics, Universitat de Barcelona (UB), Barcelona, Spain.

Itziar Irigoien , Patricia Mas-Bermejo , Sergi Papiol , Neus Barrantes-Vidal , Araceli Rosa

Most methodological Polygenic Risk Score (PRS)-related papers explain the laborious process of computing the PRS in great depth. Afterwards, as a last step, it is generally described that to test a possible association between a PRS and a trait of interest, an analysis through regression models (linear or logistic, depending on data type) should be carried out adjusting for covariates (e.g.

View Article and Find Full Text PDF

Similar Publications