98%
921
2 minutes
20
Genome foundation models hold transformative potential for precision medicine, drug discovery, and understanding complex biological systems. However, existing models are often inefficient, constrained by suboptimal tokenization and architectural design, and biased toward reference genomes, limiting their representation of low-abundance, uncultured microbes in the rare biosphere. To address these challenges, we developed , a 4-billion-parameter generative genome foundation model trained on over 600 Gbp of high-quality contigs derived from 220 TB of metagenomic datasets collected from diverse habitats across Earth's ecosystems. A key innovation of GenomeOcean is training directly on large-scale co-assemblies of metagenomic samples, enabling enhanced representation of rare microbial species and improving generalizability beyond genome-centric approaches. We implemented a byte-pair encoding (BPE) tokenization strategy for genome sequence generation, alongside architectural optimizations, achieving up to 150× faster sequence generation while maintaining high biological fidelity. GenomeOcean excels in representing microbial species and generating protein-coding genes constrained by evolutionary principles. Additionally, its fine-tuned model demonstrates the ability to discover novel biosynthetic gene clusters (BGCs) in natural genomes and perform zero-shot synthesis of biochemically plausible, complete BGCs. GenomeOcean sets a new benchmark for metagenomic research, natural product discovery, and synthetic biology, offering a robust foundation for advancing these fields.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11838515 | PMC |
http://dx.doi.org/10.1101/2025.01.30.635558 | DOI Listing |
Biol Cybern
September 2025
Department of Mechanical Science and Engineering, University of Illinois Urbana-Champaign, 61801, IL, USA.
In this article, a biophysically realistic model of a soft octopus arm with internal musculature is presented. The modeling is motivated by experimental observations of sensorimotor control where an arm localizes and reaches a target. Major contributions of this article are: (i) development of models to capture the mechanical properties of arm musculature, the electrical properties of the arm peripheral nervous system (PNS), and the coupling of PNS with muscular contractions; (ii) modeling the arm sensory system, including chemosensing and proprioception; and (iii) algorithms for sensorimotor control, which include a novel feedback neural motor control law for mimicking target-oriented arm reaching motions, and a novel consensus algorithm for solving sensing problems such as locating a food source from local chemical sensory information (exogenous) and arm deformation information (endogenous).
View Article and Find Full Text PDFStress Biol
September 2025
Shaanxi Key Laboratory of Molecular Biology for Agriculture, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, China.
Understanding the genetic mechanism of cold adaptation in cashmere goats and dairy goats is very important to improve their production performance. The purpose of this study was to comprehensively analyze the genetic basis of goat adaptation to cold environments, clarify the impact of environmental factors on genome diversity, and lay the foundation for breeding goat breeds to adapt to climate change. A total of 240 dairy goats were subjected to genome resequencing, and the whole genome sequencing data of 57 individuals from 6 published breeds were incorporated.
View Article and Find Full Text PDFAppl Microbiol Biotechnol
September 2025
School of Plant Sciences, The University of Arizona, 1140 E South Campus Drive, Forbes 303, Tucson, AZ, 85721, USA.
Fungal endophytes and epiphytes associated with plant leaves can play important ecological roles through the production of specialized metabolites encoded by biosynthetic gene clusters (BGCs). However, their functional capacity, especially in crops like lettuce (Lactuca sativa L.), remains poorly understood.
View Article and Find Full Text PDFCurr Genet
September 2025
Fermentation and Microbial Biotechnology Division, CSIR-Indian Institute of Integrative Medicine, Canal Road, Jammu-Tawi, 180001, India.
Trichoderma species exhibit remarkable versatility in adaptability and in occupying habitats with lifestyles ranging from mycoparasitism and saprotrophy to endophytism. In this study, we present the first high-quality whole-genome assembly and annotation of T. lixii using Illumina HiSeq technology to explore the mechanisms of endophytic lifestyle and plant colonization.
View Article and Find Full Text PDFCurr Microbiol
September 2025
Microbiology Laboratory, Department of Life Science, Kyonggi University, Suwon, Gyeonggi-Do, Republic of Korea.
A yellow-pigmented, non-motile, rod-shaped, and Gram-stain-negative bacterium was isolated from the soil of Yeongheung Island, Korea. The novel isolate, strain N803, was strictly aerobic, grew optimally at 30-35 °C, at pH 6.5, and in the presence of 0-2% NaCl.
View Article and Find Full Text PDF