98%
921
2 minutes
20
Motivation: Millions of protein sequences have been generated by numerous genome and transcriptome sequencing projects. However, experimentally determining the function of the proteins is still a time consuming, low-throughput, and expensive process, leading to a large protein sequence-function gap. Therefore, it is important to develop computational methods to accurately predict protein function to fill the gap. Even though many methods have been developed to use protein sequences as input to predict function, much fewer methods leverage protein structures in protein function prediction because there was lack of accurate protein structures for most proteins until recently.
Results: We developed TransFun - a method using a transformer-based protein language model and 3D-equivariant graph neural networks to distill information from both protein sequences and structures to predict protein function. It extracts feature embeddings from protein sequences using a pre-trained protein language model (ESM) via transfer learning and combines them with 3D structures of proteins predicted by AlphaFold2 through equivariant graph neural networks. Benchmarked on the CAFA3 test dataset and a new test dataset, TransFun outperforms several state-of-the-art methods, indicating the language model and 3D-equivariant graph neural networks are effective methods to leverage protein sequences and structures to improve protein function prediction. Combining TransFun predictions and sequence similarity-based predictions can further increase prediction accuracy.
Availability: The source code of TransFun is available at https://github.com/jianlin-cheng/TransFun.
Contact: chengji@missouri.edu.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9882282 | PMC |
http://dx.doi.org/10.1101/2023.01.17.524477 | DOI Listing |
Nat Biotechnol
September 2025
Institute of Engineering in Medicine, University of California, San Diego, La Jolla, CA, USA.
RNA-protein interactions critically regulate gene expression and cellular processes, yet their comprehensive mapping remains challenging due to their structural diversity. We introduce PRIM-seq (protein-RNA interaction mapping by sequencing), a method for concurrent de novo identification of RNA-binding proteins and their associated RNAs. PRIM-seq generates unique chimeric DNA sequences by proximity ligation of RNAs with protein-linked DNA barcodes, which are subsequently decoded through sequencing.
View Article and Find Full Text PDFNat Commun
September 2025
CSSB Centre for Structural Systems Biology, Deutsches Elektronen Synchroton DESY, Leibniz Institute of Virology, University of Lübeck, Hamburg, Germany.
In coronavirus (CoV) infection, polyproteins (pp1a/pp1ab) are processed into non-structural proteins (nsps), which largely form the replication/transcription complex (RTC). The polyprotein processing and complex formation is critical and offers potential therapeutic targets. However, the interplay of polyprotein processing and RTC-assembly remains poorly understood.
View Article and Find Full Text PDFInt J Biol Macromol
September 2025
National Demonstration Center for Experimental Fisheries Science Education (Shanghai Ocean University), Shanghai, 201306, China; Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources (Shanghai Ocean University), Ministry of Education, Shanghai, 201306, China; International Resea
Phase separation has been discovered as a new form of regulation in innate immunity. Here, we found that IL6Ra in teleost fish has a unique intrinsic disordered region (IDR) in its amino acid sequence, distinguishing it from the IL6Ra of higher vertebrates. This unique feature endows IL6Ra with the ability to undergo liquid-liquid phase separation, enabling the organism to swiftly initiate an immune response at the early stages of viral infection.
View Article and Find Full Text PDFAm J Trop Med Hyg
September 2025
Rickettsial Zoonoses Branch, Centers for Disease Control and Prevention, Atlanta, Georgia.
Haemaphysalis leporispalustris (the rabbit tick) is one of the most broadly distributed hard tick species in the Americas. In 2018, investigators amplified DNA from a spotted fever group Rickettsia (SFGR) species found in host-seeking larvae and nymphs of H. leporispalustris collected in northern California and proposed the name Candidatus "Rickettsia lanei" using results obtained via multilocus sequence typing.
View Article and Find Full Text PDFEcotoxicol Environ Saf
September 2025
Key Laboratory of Environment Remediation and Ecological Health, Ministry of Education, College of Environmental & Resource Science, Zhejiang University, Hangzhou 310058, China; Zhejiang Provincial Key Laboratory of Subtropic Soil and Plant Nutrition, Zhejiang University, Hangzhou 310058, China. Ele
Seven plant growth-promoting bacteria (PGPB) were isolated from extracts of surface-sterilized Sedum alfredii Hance. Among the seven isolates, the strain SaRB5 identified as Stenotrophomonas maltophilia through 16S rDNA sequence analysis, exhibited highest levels of heavy metal resistance and plant growth-promoting traits. SaRB5 tolerated high concentrations of cadmium (Cd) (1.
View Article and Find Full Text PDF