Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Motivation: Millions of protein sequences have been generated by numerous genome and transcriptome sequencing projects. However, experimentally determining the function of the proteins is still a time consuming, low-throughput, and expensive process, leading to a large protein sequence-function gap. Therefore, it is important to develop computational methods to accurately predict protein function to fill the gap. Even though many methods have been developed to use protein sequences as input to predict function, much fewer methods leverage protein structures in protein function prediction because there was lack of accurate protein structures for most proteins until recently.

Results: We developed TransFun - a method using a transformer-based protein language model and 3D-equivariant graph neural networks to distill information from both protein sequences and structures to predict protein function. It extracts feature embeddings from protein sequences using a pre-trained protein language model (ESM) via transfer learning and combines them with 3D structures of proteins predicted by AlphaFold2 through equivariant graph neural networks. Benchmarked on the CAFA3 test dataset and a new test dataset, TransFun outperforms several state-of-the-art methods, indicating the language model and 3D-equivariant graph neural networks are effective methods to leverage protein sequences and structures to improve protein function prediction. Combining TransFun predictions and sequence similarity-based predictions can further increase prediction accuracy.

Availability: The source code of TransFun is available at https://github.com/jianlin-cheng/TransFun.

Contact: chengji@missouri.edu.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9882282PMC
http://dx.doi.org/10.1101/2023.01.17.524477DOI Listing

Publication Analysis

Top Keywords

protein sequences
24
protein function
20
graph neural
16
neural networks
16
protein
15
sequences structures
12
predict protein
12
language model
12
equivariant graph
8
methods leverage
8

Similar Publications

RNA-protein interactions critically regulate gene expression and cellular processes, yet their comprehensive mapping remains challenging due to their structural diversity. We introduce PRIM-seq (protein-RNA interaction mapping by sequencing), a method for concurrent de novo identification of RNA-binding proteins and their associated RNAs. PRIM-seq generates unique chimeric DNA sequences by proximity ligation of RNAs with protein-linked DNA barcodes, which are subsequently decoded through sequencing.

View Article and Find Full Text PDF

The kinetics of nsp7-11 polyprotein processing and impact on complexation with nsp16 among human coronaviruses.

Nat Commun

September 2025

CSSB Centre for Structural Systems Biology, Deutsches Elektronen Synchroton DESY, Leibniz Institute of Virology, University of Lübeck, Hamburg, Germany.

In coronavirus (CoV) infection, polyproteins (pp1a/pp1ab) are processed into non-structural proteins (nsps), which largely form the replication/transcription complex (RTC). The polyprotein processing and complex formation is critical and offers potential therapeutic targets. However, the interplay of polyprotein processing and RTC-assembly remains poorly understood.

View Article and Find Full Text PDF

Phase separation in innate immunity: Teleost IL6Ra's evolutionary leap against viruses.

Int J Biol Macromol

September 2025

National Demonstration Center for Experimental Fisheries Science Education (Shanghai Ocean University), Shanghai, 201306, China; Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources (Shanghai Ocean University), Ministry of Education, Shanghai, 201306, China; International Resea

Phase separation has been discovered as a new form of regulation in innate immunity. Here, we found that IL6Ra in teleost fish has a unique intrinsic disordered region (IDR) in its amino acid sequence, distinguishing it from the IL6Ra of higher vertebrates. This unique feature endows IL6Ra with the ability to undergo liquid-liquid phase separation, enabling the organism to swiftly initiate an immune response at the early stages of viral infection.

View Article and Find Full Text PDF

Haemaphysalis leporispalustris (the rabbit tick) is one of the most broadly distributed hard tick species in the Americas. In 2018, investigators amplified DNA from a spotted fever group Rickettsia (SFGR) species found in host-seeking larvae and nymphs of H. leporispalustris collected in northern California and proposed the name Candidatus "Rickettsia lanei" using results obtained via multilocus sequence typing.

View Article and Find Full Text PDF

Novel plant growth-promoting endophytic bacteria, Stenotrophomonas maltophilia SaRB5, facilitate phytoremediation by plant growth and cadmium absorption in Salix suchowensis.

Ecotoxicol Environ Saf

September 2025

Key Laboratory of Environment Remediation and Ecological Health, Ministry of Education, College of Environmental & Resource Science, Zhejiang University, Hangzhou 310058, China; Zhejiang Provincial Key Laboratory of Subtropic Soil and Plant Nutrition, Zhejiang University, Hangzhou 310058, China. Ele

Seven plant growth-promoting bacteria (PGPB) were isolated from extracts of surface-sterilized Sedum alfredii Hance. Among the seven isolates, the strain SaRB5 identified as Stenotrophomonas maltophilia through 16S rDNA sequence analysis, exhibited highest levels of heavy metal resistance and plant growth-promoting traits. SaRB5 tolerated high concentrations of cadmium (Cd) (1.

View Article and Find Full Text PDF