PEDL: extracting protein-protein associations using deep language models and distant supervision.

Leon Weber , Kirsten Thobe , Oscar Arturo Migueles Lozano , Jana Wolf , Ulf Leser

Bioinformatics

Computer Science Department, Humboldt-Universität zu Berlin, Berlin 10099, Germany.

Published: July 2020

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Motivation: A significant portion of molecular biology investigates signalling pathways and thus depends on an up-to-date and complete resource of functional protein-protein associations (PPAs) that constitute such pathways. Despite extensive curation efforts, major pathway databases are still notoriously incomplete. Relation extraction can help to gather such pathway information from biomedical publications. Current methods for extracting PPAs typically rely exclusively on rare manually labelled data which severely limits their performance.

Results: We propose PPA Extraction with Deep Language (PEDL), a method for predicting PPAs from text that combines deep language models and distant supervision. Due to the reliance on distant supervision, PEDL has access to an order of magnitude more training data than methods solely relying on manually labelled annotations. We introduce three different datasets for PPA prediction and evaluate PEDL for the two subtasks of predicting PPAs between two proteins, as well as identifying the text spans stating the PPA. We compared PEDL with a recently published state-of-the-art model and found that on average PEDL performs better in both tasks on all three datasets. An expert evaluation demonstrates that PEDL can be used to predict PPAs that are missing from major pathway databases and that it correctly identifies the text spans supporting the PPA.

Availability And Implementation: PEDL is freely available at https://github.com/leonweber/pedl. The repository also includes scripts to generate the used datasets and to reproduce the experiments from this article.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355289	PMC
http://dx.doi.org/10.1093/bioinformatics/btaa430	DOI Listing

Publication Analysis

Top Keywords

deep language

distant supervision

pedl

protein-protein associations

language models

models distant

major pathway

pathway databases

manually labelled

predicting ppas

Similar Publications

DeepPhosPPI: a deep learning framework with attention-CNN and transformer for predicting phosphorylation effects on protein-protein interactions.

Brief Bioinform

September 2025

College of Computing and Data Science, Nanyang Technological University, 639798, Singapore.

Yinyin Gong , Rui Li , Yan Liu , Jilong Wang , Danny Z Chen

Protein phosphorylation regulates protein function and cellular signaling pathways, and is strongly associated with diseases, including neurodegenerative disorders and cancer. Phosphorylation plays a critical role in regulating protein activity and cellular signaling by modulating protein-protein interactions (PPIs). It alters binding affinities and interaction networks, thereby influencing biological processes and maintaining cellular homeostasis.

View Article and Find Full Text PDF

Similar Publications

Medical SAM-Clip Grafting for brain tumor segmentation.

Comput Biol Med

August 2025

The First People Hospital of Foshan, Foshan City CN, China. Electronic address:

Xinjun Yu , Zhoushan Feng , Xiaohong Wu , Jianqiu Chen , Weidong Chen

Brain Tumor Segmentation (BTS) is crucial for accurate diagnosis and treatment planning, but existing CNN and Transformer-based methods often struggle with feature fusion and limited training data. While recent large-scale vision models like Segment Anything Model (SAM) and CLIP offer potential, SAM is trained on natural images, lacking medical domain knowledge, and its decoder struggles with accurate tumor segmentation. To address these challenges, we propose the Medical SAM-Clip Grafting Network (MSCG), which introduces a novel SC-grafting module.

View Article and Find Full Text PDF

Similar Publications

Living with risk, then and now: A dual review of Cam Grey's Living with Risk in the Late Roman World and of current AI-assisted book reviewing.

Risk Anal

September 2025

Edward J. Bloustein School, Rutgers University, New Brunswick, New Jersey, USA.

Louis Anthony Cox , Michael R Greenberg

This AI-assisted review article offers a dual review: a book review of Living with Risk in the Late Roman World by Cam Grey, and a critical review of the current potential of large language models (LLMs), specifically ChatGPT's DeepResearch mode, to assist in thoughtful and scholarly book reviewing within risk science. Grey's book presents an innovative reconstruction of how communities in the late Roman Empire perceived and adapted to chronic environmental and societal risks, emphasizing spatial variability, cultural interpretation, and the normalization of uncertainty. Drawing on commentary from a human reviewer and a parallel AI-assisted analysis, we compare the distinct strengths and limitations of each approach.

View Article and Find Full Text PDF

Similar Publications

Uncovering differential tolerance to deletions versus substitutions with a protein language model.

Cell Syst

September 2025

Diabetes Center, University of California, San Francisco, CA, USA; Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, USA; Department of Epidemiology & Biostatistics, University of California, San Francisco, CA, USA; Department of Bioengineering & Therapeutic

Grant Goldman , Prathamesh Chati , Vasilis Ntranos

Deep mutational scanning (DMS) experiments have been successfully leveraged to understand genotype to phenotype mapping. However, the overwhelming majority of DMS have focused on amino acid substitutions. Thus, it remains unclear how indels differentially shape the fitness landscape relative to substitutions.

View Article and Find Full Text PDF

Similar Publications

The detection of algebraic auditory structures emerges with self-supervised learning.

PLoS Comput Biol

September 2025

Laboratoire des Systèmes Perceptifs, Département d'études Cognitives, École Normale Supérieure, PSL University, CNRS, Paris, France.

Pierre Orhan , Yves Boubenec , Jean-Rémi King

Humans can spontaneously detect complex algebraic structures. Historically, two opposing views explain this ability, at the root of language and music acquisition. Some argue for the existence of an innate and specific mechanism.

View Article and Find Full Text PDF

Similar Publications