Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Protein sequence similarity search is fundamental to biology research, but current methods are typically not able to consider crucial genomic context information indicative of protein function, especially in microbial systems. Here, we present Gaia (Genomic AI Annotator), a sequence annotation platform that enables rapid, context-aware protein sequence search across genomic datasets. Gaia leverages gLM2, a mixed-modality genomic language model trained on both amino acid sequences and their genomic neighborhoods to generate embeddings that integrate sequence-structure-context information. This approach allows for the identification of functionally and/or evolutionarily related genes that are found in conserved genomic contexts, which may be missed by traditional sequence- or structure-based search alone. Gaia enables real-time search of a curated database comprising more than 85 million protein clusters from 131,744 microbial genomes. We compare the homolog retrieval performance of Gaia search against other embedding and alignment-based approaches. We provide Gaia as a web-based, freely available tool.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12180486PMC
http://dx.doi.org/10.1126/sciadv.adv5109DOI Listing

Publication Analysis

Top Keywords

protein sequence
12
sequence annotation
8
genomic
7
gaia
6
protein
5
search
5
gaia ai-enabled
4
ai-enabled genomic
4
genomic context-aware
4
context-aware platform
4

Similar Publications

Superinfection exclusion (SIE) is a finely tuned virus-virus interaction mechanism closely linked to the viral infection cycle. However, the mechanistic basis of SIE remains incompletely understood in plant viruses, particularly among negative-sense, single-stranded RNA viruses. In this study, we first describe the development of an efficient reverse genetics system for the plant nucleorhabdovirus Physostegia chlorotic mottle virus (PhCMoV) by codon optimisation of the large polymerase coding sequence.

View Article and Find Full Text PDF

Background: Bacillus thuringiensis Cry toxins are well known for their insecticidal properties, primarily through the formation of ion-leakage pores via α4-α5 hairpins. His178 in helix 4 of the Cry4Aa mosquito-active toxin has been suggested to play a crucial role in its biotoxicity.

Objective: This study aimed to investigate the functional importance of Cry4Aa-His178 through experimental and computational analyses.

View Article and Find Full Text PDF

Many adenovirus (AdV) species have been isolated from human and non-human primates. Here we describe the isolation of a new AdV from a western lowland gorilla held captive in a zoo. Analysis of the genome sequence demonstrated that this virus is a member of the Mastadenovirus genus, but markedly distinct from all previously described species.

View Article and Find Full Text PDF

The UPF0235 UniProt family proteins are conserved across archaea, bacteria, and eukaryotes; however, they remain functionally uncharacterized. Here, we report the high resolution (1.3 Å) crystal structure of UPF0235 protein (PF1765, UniProt: Q8U052) from Pyrococcus furiosus.

View Article and Find Full Text PDF

Molecular characterization of Spodoptera frugiperda nose resistant to fluoxetine protein 6 and its putative involvement in tolerance to cyantraniliprole.

Pestic Biochem Physiol

November 2025

College of Plant Protection, Yangzhou University, Yangzhou 225009, China; Jiangsu Province Engineering Research Center of Green Pesticides, Yangzhou University, Yangzhou 225009, China. Electronic address:

Spodoptera frugiperda (FAW) is a notorious polyphagous pest that has developed resistance to various insecticides including diamide insecticides. Our previous study established a FAW cyantraniliprole-resistant (SfCYAN-R) strain by laboratory resistance selection of susceptible strain (SfCYAN-S), however, the potential resistance mechanisms of FAW to cyantraniliprole remain unclear. In this study, SfNrf6 encoding nose resistant to fluoxetine (Nrf) protein 6 was identified to be upregulated in SfCYAN-R strain compared with SfCYAN-S strain based on RNA-Seq data and RT-qPCR.

View Article and Find Full Text PDF