98%
921
2 minutes
20
Inferring protein function is a fundamental and long-standing problem in biology. Laboratory experiments in this field are often expensive, and therefore large-scale computational protein inference from readily available amino acid sequences is needed to understand in more detail the mechanisms underlying biological processes in living organisms. Recently, studies have utilised mathematical ideas from natural language processing and self-supervised learning, to derive features based on protein sequence information. In the area of language modelling, it has been shown that learnt representations from self-supervised pre-training can capture the semantic information of words well for downstream applications. In this study, we tested the ability of sequence-based protein representations learnt using self-supervised pre-training on a large protein database, on multiple protein inference tasks. We show that simple baseline representations in the form of bag-of-words histograms perform better than those based on self-supervised learning, on sequence similarity and protein inference tasks. By feature selection we show that the top discriminant features help bag-of-words capture important information for data-driven function prediction. These findings could have important implications for self-supervised learning models on protein sequences, and might encourage the consideration of alternative pre-training schemes for learning representations that capture more meaningful biological information from the sequence alone.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12327643 | PMC |
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0325531 | PLOS |
Genetica
September 2025
Faculty of Fisheries and Aquaculture Sciences, Universiti Malaysia Terengganu, Kuala Nerus, Terengganu, Malaysia.
Population genetics plays a critical role in creating policies for managing fisheries, conservation, and development of aquaculture. The golden snapper, Lutjanus johnii (Bloch, 1792), is a highly commercial and aquaculture important snapper species. This study used mitochondrial markers D-loop (151 specimens) and Cytochrome b (Cyt-b, 120 specimens) from 10 populations, including populations from the east South China Sea, the west South China Sea and the Strait of Malacca to investigate the genetic diversity, population connectivity, and historical demography of L.
View Article and Find Full Text PDFmBio
September 2025
School of Biological Sciences, University of Auckland, Auckland, New Zealand.
The rotation of the bacterial flagellum is powered by the MotAB stator complex, which converts ion flux into torque. Despite its central role in flagellar function, the evolutionary origin and structural diversity of this system remain poorly understood. Here, we present the first comprehensive phylogenetic and structural characterization of MotAB and its closest non-flagellar homologs.
View Article and Find Full Text PDFMol Omics
September 2025
Division of Animal Sciences, University of Missouri, 920 East Campus Drive, Columbia, Missouri 65211, USA.
Mice lacking caveolin-1 (), a major protein of the lipid raft of plasma membrane, show deregulated cellular proliferation of the mammary gland and an abnormal fetoplacental communication during pregnancy. This study leverages a multi-omics approach to test the hypothesis that the absence of elicits a coordinated crosstalk of genes among the mammary gland, placenta and fetal brain in pregnant mice. Integrative analysis of metabolomics and transcriptomics data of mammary glands showed that the loss of significantly impacted specific metabolites and metabolic pathways in the pregnant mice.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
September 2025
Martin A. Fisher School of Physics, Brandeis University, Waltham, MA 02453.
Programmable self-assembly has recently enabled the creation of complex structures through precise control of the interparticle interactions and the particle geometries. Targeting ever more structurally complex, dynamic, and functional assemblies necessitates going beyond the design of the structure itself, to the measurement and control of the local flexibility of the intersubunit connections and its impact on the collective mechanics of the entire assembly. In this study, we demonstrate a method to infer the mechanical properties of multisubunit assemblies using cryogenic electron microscopy (cryo-EM) and RELION's multi-body refinement.
View Article and Find Full Text PDFMol Biol Rep
September 2025
ICAR-Central Institute of Fisheries Education, Versova, Mumbai, 400061, India.
Background: Labeo fimbriatus (Bloch, 1795) is a medium-sized South Asian minor carp with ecological significance and emerging aquaculture potential, particularly in polyculture systems with Indian major carps. Despite its wide distribution, it remains underrepresented in phylogenetic studies, and limited genomic resources are available. Here, we report the complete mitochondrial genome sequence of L.
View Article and Find Full Text PDF