SPACE: STRING proteins as complementary embeddings.

Bioinformatics

Novo Nordisk Foundation Center for Protein Research, Department of Cellular and Molecular Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, 2200, Denmark.

Published: September 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Motivation: Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but the use of protein networks has proven to be challenging in the context of machine learning, especially in a cross-species setting.

Results: We leveraged the STRING database of protein networks and orthology relations for 1,322 eukaryotes to generate network-based cross-species protein embeddings. We did this by first creating species-specific network embeddings and subsequently aligning them based on orthology relations to facilitate direct cross-species comparisons. We show that these aligned network embeddings ensure consistency across species without sacrificing quality compared to species-specific network embeddings. We also show that the aligned network embeddings are complementary to sequence embedding techniques, despite the use of seqeuence-based orthology relations in the alignment process. Finally, we validated the embeddings by using them for two well-established tasks: subcellular localization prediction and protein function prediction. Training logistic regression classifiers on aligned network embeddings and sequence embeddings improved the accuracy over using sequence alone, reaching performance numbers close to state-of-the-art deep-learning methods.

Availability And Implementation: The source code and scripts for generating the network-based cross-species protein embeddings are available at https://github.com/deweihu96/SPACE. Precomputed network embeddings and sequence embeddings for all eukaryotic proteins are included in STRING version 12.0 (https://string-db.org/cgi/download).

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaf496DOI Listing

Publication Analysis

Top Keywords

network embeddings
24
embeddings
12
protein networks
12
orthology relations
12
aligned network
12
prediction protein
8
protein function
8
subcellular localization
8
network-based cross-species
8
cross-species protein
8

Similar Publications

Background: Falls are a major cause of injury and death among the elderly, highlighting the need for effective and real-time detection systems. Embedded Internet of Health Things (IoHT) technologies integrating sensors, microcontrollers, and communication modules offer continuous monitoring and rapid response. However, the research landscape remains fragmented, and no comprehensive bibliometric review has been conducted.

View Article and Find Full Text PDF

Phenotype-driven approaches identify disease-counteracting compounds by analysing the phenotypic signatures that distinguish diseased from healthy states. Here we introduce PDGrapher, a causally inspired graph neural network model that predicts combinatorial perturbagens (sets of therapeutic targets) capable of reversing disease phenotypes. Unlike methods that learn how perturbations alter phenotypes, PDGrapher solves the inverse problem and predicts the perturbagens needed to achieve a desired response by embedding disease cell states into networks, learning a latent representation of these states, and identifying optimal combinatorial perturbations.

View Article and Find Full Text PDF

With growing public attention to environmental issues and sustainable development, biodegradable bio-based plastics have attracted widespread interest. This study reveals the chemical-physical synergistic regulation mechanism of biodegradable PLA/PBAT blends through the synergistic modification of epoxidized natural rubber (ENR) and epoxy chain extender (ADR). Interfacial interaction analysis shows that PBAT tends to encapsulate ENR to form aggregates.

View Article and Find Full Text PDF

Knowledge tracing can reveal students' level of knowledge in relation to their learning performance. Recently, plenty of machine learning algorithms have been proposed to exploit to implement knowledge tracing and have achieved promising outcomes. However, most of the previous approaches were unable to cope with long sequence time-series prediction, which is more valuable than short sequence prediction that is extensively utilized in current knowledge-tracing studies.

View Article and Find Full Text PDF

Drug-target interaction (DTI) prediction is essential for the development of novel drugs and the repurposing of existing ones. However, when the features of drug and target are applied to biological networks, there is a lack of capturing the relational features of drug-target interactions. And the corresponding multimodal models mainly depend on shallow fusion strategies, which results in suboptimal performance when trying to capture complex interaction relationships.

View Article and Find Full Text PDF