Cracking the genetic code with neural networks.

Front Artif Intell

Biomechanics Research Unit, GIGA in Silico Medicine, Liège University, Liège, Belgium.

Published: April 2023


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

The genetic code is textbook scientific knowledge that was soundly established without resorting to Artificial Intelligence (AI). The goal of our study was to check whether a neural network could re-discover, on its own, the mapping links between codons and amino acids and build the complete deciphering dictionary upon presentation of transcripts proteins data training pairs. We compared different Deep Learning neural network architectures and estimated quantitatively the size of the required human transcriptomic training set to achieve the best possible accuracy in the codon-to-amino-acid mapping. We also investigated the effect of a codon embedding layer assessing the semantic similarity between codons on the rate of increase of the training accuracy. We further investigated the benefit of quantifying and using the unbalanced representations of amino acids within real human proteins for a faster deciphering of rare amino acids codons. Deep neural networks require huge amount of data to train them. Deciphering the genetic code by a neural network is no exception. A test accuracy of 100% and the unequivocal deciphering of rare codons such as the tryptophan codon or the stop codons require a training dataset of the order of 4-22 millions cumulated pairs of codons with their associated amino acids presented to the neural network over around 7-40 training epochs, depending on the architecture and settings. We confirm that the wide generic capacities and modularity of deep neural networks allow them to be customized easily to learn the deciphering task of the genetic code efficiently.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10117997PMC
http://dx.doi.org/10.3389/frai.2023.1128153DOI Listing

Publication Analysis

Top Keywords

genetic code
16
neural network
16
amino acids
16
neural networks
12
code neural
8
deciphering rare
8
deep neural
8
neural
7
codons
6
deciphering
5

Similar Publications

Genomic and morphological characterization of a novel iridovirus, bivalve iridovirus 1 (BiIV1), infecting the common cockle ().

Microb Genom

September 2025

International Centre of Excellence for Aquatic Animal Health, The Centre for Environment, Fisheries and Aquaculture Science, Weymouth, DT4 8UB, UK.

High rates of mortality of the common cockle, , have occurred in the Wash Estuary, UK, since 2008. A previous study linked the mortalities to a novel genotype of , with a strong correlation between cockle moribundity and the presence of . Here, we characterize a novel iridovirus, identified by chance during metagenomic sequencing of a gradient purification of cells, with the presence also correlated to cockle moribundity.

View Article and Find Full Text PDF

Genetic code expansion (GCE) technology has primarily been devoted to the introduction of noncanonical amino acids (ncAAs) into ribosomally synthesized proteins or peptides. Its potential for modifying nonribosomal natural products remains unexplored. In this study, we introduce a novel strategy that integrates GCE with the directed evolution of cyclodipeptide synthase (CDPS) to engineer a new class of CDPSs capable of biosynthesizing cyclodipeptides containing ncAAs.

View Article and Find Full Text PDF

Acute ischemic stroke (AIS) remains a leading cause of mortality and long-term disability globally, with survivors at high risk of recurrent stroke, cardiovascular events, and post-stroke dementia. Statins, while widely used for their lipid-lowering effects, also possess pleiotropic properties, including anti-inflammatory, endothelial-stabilizing, and neuroprotective actions, which may offer added benefit in AIS management. This article synthesizes emerging evidence on statins' dual mechanisms of action and evaluates their role in reducing recurrence, improving survival, and mitigating cognitive decline.

View Article and Find Full Text PDF

Motivation: Due to the intricate etiology of neurological disorders, finding interpretable associations between multiomics features can be challenging using standard approaches.

Results: We propose COMICAL, a contrastive learning approach using multiomics data to generate associations between genetic markers and brain imaging-derived phenotypes. COMICAL jointly learns omics representations utilizing transformer-based encoders with custom tokenizers.

View Article and Find Full Text PDF

Most methodological Polygenic Risk Score (PRS)-related papers explain the laborious process of computing the PRS in great depth. Afterwards, as a last step, it is generally described that to test a possible association between a PRS and a trait of interest, an analysis through regression models (linear or logistic, depending on data type) should be carried out adjusting for covariates (e.g.

View Article and Find Full Text PDF