A PHP Error was encountered

Severity: Warning

Message: file_get_contents(https://...@gmail.com&api_key=61f08fa0b96a73de8c900d749fcb997acc09&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests

Filename: helpers/my_audit_helper.php

Line Number: 197

Backtrace:

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 197
Function: file_get_contents

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 271
Function: simplexml_load_file_from_url

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3165
Function: getPubMedXML

File: /var/www/html/application/controllers/Detail.php
Line: 597
Function: pubMedSearch_Global

File: /var/www/html/application/controllers/Detail.php
Line: 511
Function: pubMedGetRelatedKeyword

File: /var/www/html/index.php
Line: 317
Function: require_once

Genomic language models (gLMs) decode bacterial genomes for improved gene prediction and translation initiation site identification. | LitMetric

Genomic language models (gLMs) decode bacterial genomes for improved gene prediction and translation initiation site identification.

Brief Bioinform

Bioinformatics Laboratory, College of Computing, University Mohammed VI Polytechnic, Lot 660, Hay Moulay Rachid, Ben Guerir 43150, Morocco.

Published: July 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Accurate bacterial gene prediction is essential for understanding microbial functions and advancing biotechnology. Traditional methods based on sequence homology and statistical models often struggle with complex genetic variations and novel sequences due to their limited ability to interpret the "language of genes." To overcome these challenges, we explore genomic language models (gLMs)-inspired by large language models in natural language processing-to enhance bacterial gene prediction. These models learn patterns and contextual dependencies within genetic sequences, similar to how LLMs process human language. We employ transformers, specifically DNABERT, for bacterial gene prediction using a two-stage framework: first, identifying coding sequence (CDS) regions, and then refining predictions by identifying the correct translation initiation sites (TIS). DNABERT is fine-tuned on a curated set of NCBI complete bacterial genomes using a k-mer tokenizer for sequence processing. Our results show that GeneLM significantly improves gene prediction accuracy. Compared with the leading prokaryotic gene finders, Prodigal, GeneMark-HMM, and Glimmer, and other recent deep learning methods, GeneLM reduces missed CDS predictions while increasing matched annotations. More notably, our TIS predictions surpass traditional methods when tested against experimentally verified sites. GeneLM demonstrates the power of gLMs in decoding genetic information, achieving state-of-the-art performance in bacterial genome analysis. This advancement highlights the potential of language models to revolutionize genome annotation, outperforming conventional tools and enabling more precise genetic insights.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12222049PMC
http://dx.doi.org/10.1093/bib/bbaf311DOI Listing

Publication Analysis

Top Keywords

gene prediction
20
language models
16
bacterial gene
12
genomic language
8
bacterial genomes
8
translation initiation
8
traditional methods
8
models
6
bacterial
6
gene
6

Similar Publications