Gene mutation estimations via mutual information and Ewens sampling based CNN & machine learning algorithms.

J Appl Stat

Department of Mathematics and State Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, People's Republic of China.

Published: February 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

We conduct gene mutation rate estimations via developing mutual information and Ewens sampling based convolutional neural network (CNN) and machine learning algorithms. More precisely, we develop a systematic methodology through constructing a CNN. Meanwhile, we develop two machine learning algorithms to study protein production with target gene sequences and protein structures. The core of the CNN and machine learning approach is to address a two-stage optimization problem to balance gene mutation rates during protein production. To wit, we try to optimally coordinate the consistency between the given input DNA sequences and the given (or optimally computed) target ones through controlling their intermediate gene mutation rates. The purposes in doing so are aimed to conduct gene editing and protein structure prediction. For example, after the gene mutation rates are estimated, the computing complexity of protein structure prediction will be reduced to a reasonable degree. Our developed CNN numerical optimization scheme consists of two newly designed machine learning algorithms. The stochastic gradients for the two algorithms are designed according to the Kuhn-Tucker conditions with boundary constraints and with the support of Ewens sampling, multi-input multi-output (MIMO) mutual information, and codon optimization techniques. The associated learning rate bounds are explicitly derived from the method and the two algorithms are numerically implemented. The convergence and optimality of the algorithms are mathematically proved. To illustrate the usage of our study, we also conduct a real-world data implementation.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416021PMC
http://dx.doi.org/10.1080/02664763.2025.2460076DOI Listing

Publication Analysis

Top Keywords

gene mutation
20
machine learning
20
learning algorithms
16
ewens sampling
12
cnn machine
12
mutation rates
12
mutual ewens
8
sampling based
8
conduct gene
8
protein production
8

Similar Publications

Recent Advances in Gene Therapy for Hemophilia.

Clin Appl Thromb Hemost

September 2025

Pediatric Hematology Laboratory, Division of Hematology/Oncology, Department of Pediatrics, The Seventh Affiliated Hospital of Sun Yat-Sen University, Shenzhen, Guangdong, China.

Hemophilia, an X-linked monogenic disorder, arises from mutations in the or genes, which encode clotting factor VIII (FVIII) or clotting factor IX (FIX), respectively. As a prominent hereditary coagulation disorder, hemophilia is clinically manifested by spontaneous hemorrhagic episodes. Severe cases may progress to complications such as stroke and arthropathy, significantly compromising patients' quality of life.

View Article and Find Full Text PDF

Transcription initiation factor TFIID subunit 1 (TAF1) is a pivotal component of the TFIID complex, critical for RNA polymerase II-mediated transcription initiation. However, the molecular basis by which TAF1 recognizes and associates with chromatin remains incompletely understood. Here, we report that the tandem bromodomain module of TAF1 engages nucleosomal DNA through a distinct positively charged surface patch on the first bromodomain (BD1).

View Article and Find Full Text PDF

Distinct codon usage signatures reflecting evolutionary and pathogenic adaptation in the Acinetobacter baumannii complex.

Eur J Clin Microbiol Infect Dis

September 2025

School of Bioengineering and Biosciences, Department of Biochemistry, Lovely Professional University, Punjab, 144411, India.

Purpose: This study investigates codon usage and amino acid usage bias in the genus Acinetobacter to uncover the evolutionary forces shaping these patterns and their implications for pathogenicity and biotechnology.

Methods: Codon usage patterns were examined in representative genomes of the genus Acinetobacter using standard codon bias indices, including GC content, relative synonymous codon usage (RSCU), effective number of codons (ENC), and codon adaptation index (CAI). Neutrality and parity plots were employed to evaluate the relative influence of mutational pressure and natural selection on codon preferences.

View Article and Find Full Text PDF

Background And Objective: Bladder cancer (BC) is the sixth most common cancer in the U.S., with risk factors such as smoking, older age, and male sex.

View Article and Find Full Text PDF

Unlabelled: This report provides a detailed analysis of a singular case involving cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL) in a male patient who suffered a stroke. Our investigation delves into the clinical manifestations, genetic foundations, diagnostic complexities, and prognosis associated with CADASIL. As a notable contributor to stroke occurrence in young patients, CADASIL's impact on morbidity and mortality is influenced by stroke-related complications and cognitive decline.

View Article and Find Full Text PDF