Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Protein-protein interactions (PPIs) are involved with most cellular activities at the proteomic level, making the study of PPIs necessary to comprehending any biological process. Machine learning approaches have been explored, leading to more accurate and generalized PPIs predictions. In this paper, we propose a predictive framework called StackPPI. First, we use pseudo amino acid composition, Moreau-Broto, Moran and Geary autocorrelation descriptor, amino acid composition position-specific scoring matrix, Bi-gram position-specific scoring matrix and composition, transition and distribution to encode biologically relevant features. Secondly, we employ XGBoost to reduce feature noise and perform dimensionality reduction through gradient boosting and average gain. Finally, the optimized features that result are analyzed by StackPPI, a PPIs predictor we have developed from a stacked ensemble classifier consisting of random forest, extremely randomized trees and logistic regression algorithms. Five-fold cross-validation shows StackPPI can successfully predict PPIs with an ACC of 89.27%, MCC of 0.7859, AUC of 0.9561 on Helicobacter pylori, and with an ACC of 94.64%, MCC of 0.8934, AUC of 0.9810 on Saccharomyces cerevisiae. We find StackPPI improves protein interaction prediction accuracy on independent test sets compared to the state-of-the-art models. Finally, we highlight StackPPI's ability to infer biologically significant PPI networks. StackPPI's accurate prediction of functional pathways make it the logical choice for studying the underlying mechanism of PPIs, especially as it applies to drug design. The datasets and source code used to create StackPPI are available here: https://github.com/QUST-AIBBDRC/StackPPI/.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.compbiomed.2020.103899DOI Listing

Publication Analysis

Top Keywords

protein-protein interactions
8
prediction accuracy
8
stacked ensemble
8
ensemble classifier
8
amino acid
8
acid composition
8
position-specific scoring
8
scoring matrix
8
ppis
6
stackppi
5

Similar Publications

Integrative profiling of lung cancer biomarkers EGFR, ALK, KRAS, and PD-1 with emphasis on nanomaterials-assisted immunomodulation and targeted therapy.

Front Immunol

September 2025

Department of Thoracic Surgery, Shenzhen People's Hospital (The First Affiliated Hospital, Southern University of Science and Technology; The Second Clinical Medical College, Jinan University), Shenzhen, Guangdong, China.

Background: Lung cancer remains the leading cause of cancer-related mortality globally, primarily due to late-stage diagnosis, molecular heterogeneity, and therapy resistance. Key biomarkers such as EGFR, ALK, KRAS, and PD-1 have revolutionized precision oncology; however, comprehensive structural and clinical validation of these targets is crucial to enhance therapeutic efficacy.

Methods: Protein sequences for EGFR, ALK, KRAS, and PD-1 were retrieved from UniProt and modeled using SWISS-MODEL to generate high-confidence 3D structures.

View Article and Find Full Text PDF

Purpose: This study aimed to conduct functional proteomics across breast cancer subtypes with bioinformatics analyses.

Methods: Candidate proteins were identified using nanoscale liquid chromatography with tandem mass spectrometry (NanoLC-MS/MS) from core needle biopsy samples of early stage (0-III) breast cancers, followed by external validation with public domain gene-expression datasets (TCGA TARGET GTEx and TCGA BRCA).

Results: Seventeen proteins demonstrated significantly differential expression and protein-protein interaction (PPI) found the strong networks including COL2A1, COL11A1, COL6A1, COL6A2, THBS1 and LUM.

View Article and Find Full Text PDF

Background: Synaptic dysfunction and synapse loss occur in Alzheimer's disease (AD). The current study aimed to identify synaptic-related genes with diagnostic potential for AD.

Methods: Differentially expressed genes (DEGs) were overlapped with phenotype-associated module selected through weighted gene co-expression network analysis (WGCNA), and synaptic-related genes.

View Article and Find Full Text PDF

Background: Chronic obstructive pulmonary disease (COPD) is a chronic respiratory disease. However, the biological role of mitochondrial metabolism (MM) in COPD remains poorly understood. This study aimed to explore the underlying mechanisms of MM in COPD using bioinformatics methods.

View Article and Find Full Text PDF

Predicting Antibody-Antigen (Ab-Ag) docking and structure-based design represent significant long-term and therapeutically important challenges in computational biology. We present SAGERank, a general, configurable deep learning framework for antibody design using Graph Sample and Aggregate Networks. SAGERank successfully predicted the majority of epitopes in a cancer target dataset.

View Article and Find Full Text PDF