Explorer: efficient DNA coding by De Bruijn graph toward arbitrary local and global biochemical constraints.

Brief Bioinform

Center for Applied Mathematics, Tianjin University, No. 92, Weijin Road, Nankai District, Tianjin 300072, China.

Published: July 2024


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

With the exponential growth of digital data, there is a pressing need for innovative storage media and techniques. DNA molecules, due to their stability, storage capacity, and density, offer a promising solution for information storage. However, DNA storage also faces numerous challenges, such as complex biochemical constraints and encoding efficiency. This paper presents Explorer, a high-efficiency DNA coding algorithm based on the De Bruijn graph, which leverages its capability to characterize local sequences. Explorer enables coding under various biochemical constraints, such as homopolymers, GC content, and undesired motifs. This paper also introduces Codeformer, a fast decoding algorithm based on the transformer architecture, to further enhance decoding efficiency. Numerical experiments indicate that, compared with other advanced algorithms, Explorer not only achieves stable encoding and decoding under various biochemical constraints but also increases the encoding efficiency and bit rate by ¿10%. Additionally, Codeformer demonstrates the ability to efficiently decode large quantities of DNA sequences. Under different parameter settings, its decoding efficiency exceeds that of traditional algorithms by more than two-fold. When Codeformer is combined with Reed-Solomon code, its decoding accuracy exceeds 99%, making it a good choice for high-speed decoding applications. These advancements are expected to contribute to the development of DNA-based data storage systems and the broader exploration of DNA as a novel information storage medium.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11285171PMC
http://dx.doi.org/10.1093/bib/bbae363DOI Listing

Publication Analysis

Top Keywords

biochemical constraints
16
dna coding
8
bruijn graph
8
encoding efficiency
8
algorithm based
8
decoding efficiency
8
dna
6
storage
6
decoding
6
explorer
4

Similar Publications

We previously demonstrated lipid nanoparticle-mediated CRISPR-Cas9 gene editing to disrupt the gene encoding cytochrome P450 oxidoreductase (Cypor), combined with transient administration of acetaminophen (APAP), to repopulate the liver with healthy hepatocytes and rescue a phenylketonuria mouse model. This study aimed to investigate electroporation-mediated delivery of -targeting CRISPR-Cas9 ribonucleoproteins into wild-type hepatocytes, combined with liver engraftment under APAP treatment, as an in vivo selection approach in a mouse model of homozygous familial hypercholesterolemia (). Electroporation provides higher delivery efficiency compared to lipid nanoparticles.

View Article and Find Full Text PDF

Design and synthesis of functionally active artificial proteins is challenging, as it requires simultaneous consideration of interconnected factors, such as fold, dynamics, and function. These evolutionary constraints are encoded in protein sequences and can be learned through the latent generative landscape (LGL) framework to predict functional sequences by leveraging evolutionary patterns, enabling exploration of uncharted sequence space. By simulating designed proteins through molecular dynamics (MD), we gain deeper insights into the interdependencies governing structure and dynamics.

View Article and Find Full Text PDF

Effect of R-18 on Maize Growth Promotion Under Salt Stress.

Microorganisms

July 2025

Ningxia Key Laboratory for the Development and Application of Microbial Resources in Extreme Environments, College of Biological Science and Engineering, North Minzu University, Yinchuan 750021, China.

Soil salinization poses a significant constraint to agricultural productivity. However, certain plant growth-promoting bacteria (PGPB) can mitigate salinity stress and enhance crop performance. In this study, a bacterial isolate, R-18, isolated from saline-alkali soil in Ningxia, China, was identified as based on 16S rRNA gene sequencing.

View Article and Find Full Text PDF

DprA (also known as Smf) is a conserved RecA mediator originally characterized by its role in natural chromosomal transformation, yet its widespread presence across bacteria hints at broader DNA metabolic functions. Here, we demonstrate that DprA enhances the frequency of Hfr conjugation in vivo. In vitro, RecA·ATP binds and cooperatively polymerizes in a 50-nucleotide (nt) polydeoxy T (dT) ssDNA to form dynamic filaments that SSB inhibits, an effect fully reversed by DprA.

View Article and Find Full Text PDF

The metallicolous populations of the facultative Tl hyperaccumulator Silene latifolia are extraordinarily tolerant and capable of accumulating up to 80,000 μg Tl g in nature. A growth stimulatory effect of Tl was observed, and this study set out to determine possible mechanisms. Plants from non-metallicolous and metallicolous populations were subjected to hydroponics dosing experiments at 2.

View Article and Find Full Text PDF