Improving protein function prediction by learning and integrating representations of protein sequences and function labels.

Bioinform Adv

Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, United States.

Published: August 2024


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Motivation: As fewer than 1% of proteins have protein function information determined experimentally, computationally predicting the function of proteins is critical for obtaining functional information for most proteins and has been a major challenge in protein bioinformatics. Despite the significant progress made in protein function prediction by the community in the last decade, the general accuracy of protein function prediction is still not high, particularly for rare function terms associated with few proteins in the protein function annotation database such as the UniProt.

Results: We introduce TransFew, a new transformer model, to learn the representations of both protein sequences and function labels [Gene Ontology (GO) terms] to predict the function of proteins. TransFew leverages a large pre-trained protein language model (ESM2-t48) to learn function-relevant representations of proteins from raw protein sequences and uses a biological natural language model (BioBert) and a graph convolutional neural network-based autoencoder to generate semantic representations of GO terms from their textual definition and hierarchical relationships, which are combined together to predict protein function via the cross-attention. Integrating the protein sequence and label representations not only enhances overall function prediction accuracy, but delivers a robust performance of predicting rare function terms with limited annotations by facilitating annotation transfer between GO terms.

Availability And Implementation: https://github.com/BioinfoMachineLearning/TransFew.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11374024PMC
http://dx.doi.org/10.1093/bioadv/vbae120DOI Listing

Publication Analysis

Top Keywords

protein function
24
function prediction
16
function
13
protein sequences
12
protein
11
representations protein
8
sequences function
8
function labels
8
proteins protein
8
function proteins
8

Similar Publications

Background: Alzheimer's disease (AD) patients and animal models exhibit an altered gut microbiome that is associated with pathological changes in the brain. Intestinal miRNA enters bacteria and regulates bacterial metabolism and proliferation. This study aimed to investigate whether the manipulation of miRNA could alter the gut microbiome and AD pathologies.

View Article and Find Full Text PDF

Nuclear receptors (NRs) are a superfamily of ligand-activated transcription factors that regulate gene expression in response to metabolic, hormonal, and environmental signals. These receptors play a critical role in metabolic homeostasis, inflammation, immune function, and disease pathogenesis, positioning them as key therapeutic targets. This review explores the mechanistic roles of NRs such as PPARs, FXR, LXR, and thyroid hormone receptors (THRs) in regulating lipid and glucose metabolism, energy expenditure, cardiovascular health, and neurodegeneration.

View Article and Find Full Text PDF

Background: Most RNA-seq datasets harbor genes with extreme expression levels in some samples. Such extreme outliers are usually treated as technical errors and are removed from the data before further statistical analysis. Here we focus on the patterns of such outlier gene expression to investigate whether they provide insights into the underlying biology.

View Article and Find Full Text PDF

Background: Escherichia coli ST131 and clade H30Rx are the most prevalent extended-spectrum β-lactamase-producing E. coli (ESBL-EC) causing bacteremia and urinary tract infections globally and in Sweden. Previous studies have linked ST131-H30Rx with septic shock and mortality, as well as prolonged carriage.

View Article and Find Full Text PDF

Background: Volatile anesthetics are gaining recognition for their benefits in long-term sedation of mechanically ventilated patients with bacterial pneumonia and acute respiratory distress syndrome. In addition to their sedative role, they also exhibit anti-bacterial and anti-inflammatory properties, though the mechanisms behind these effects remain only partially understood. In vitro studies examining the prolonged impact of volatile anesthetics on bacterial growth, inflammatory cytokine response, and surfactant proteins - key to maintaining lung homeostasis - are still lacking.

View Article and Find Full Text PDF