PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine.

Bioinformatics

Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA.

Published: April 2018


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Motivation: Protein solubility can be a decisive factor in both research and production efficiency, and in silico sequence-based predictors that can accurately estimate solubility outcomes are highly sought.

Results: In this study, we present a novel approach termed PRotein SolubIlity Predictor (PaRSnIP), which uses a gradient boosting machine algorithm as well as an approximation of sequence and structural features of the protein of interest. Based on an independent test set, PaRSnIP outperformed other state-of-the-art sequence-based methods by more than 9% in accuracy and 0.17 in Matthew's correlation coefficient, with an overall accuracy of 74% and Matthew's correlation coefficient of 0.48. Additionally, PaRSnIP provides importance scores for all features used in training. We observed higher fractions of exposed residues to associate positively with protein solubility and tripeptide stretches with multiple histidines to associate negatively with solubility. The improved prediction accuracy of PaRSnIP should enable it to predict protein solubility with greater reliability and to screen for sequence variants with enhanced manufacturability.

Availability And Implementation: PaRSnIP software is available for download under GitHub (https://github.com/RedaRawi/PaRSnIP).

Contact: gwo-yu.chuang@nih.gov.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6031027PMC
http://dx.doi.org/10.1093/bioinformatics/btx662DOI Listing

Publication Analysis

Top Keywords

protein solubility
20
gradient boosting
8
boosting machine
8
matthew's correlation
8
correlation coefficient
8
solubility
7
parsnip
6
protein
6
parsnip sequence-based
4
sequence-based protein
4

Similar Publications

Globular proteins as functional-mechanical materials: a multiscale perspective on design, processing, and applications.

Mater Horiz

September 2025

MOE Key Laboratory of Macromolecule Synthesis and Functionalization, Department of Polymer Science and Engineering, Zhejiang University, Hangzhou, 310027, PR China.

Globular proteins, traditionally regarded as non-structural biomolecules due to the limited load-bearing capacity in their monomeric states, are increasingly recognized as valuable building blocks for functional-mechanical materials. Their inherent bioactivity, chemical versatility, and structural tunability enable the design of materials that combine biological functionality with tailored mechanical performance. This review highlights recent advances in engineering globular proteins-spanning natural systems (serum albumins, enzymes, milk globulins, silk sericin, and soy protein isolates) to recombinant architectures including tandem-repeat proteins-into functional-mechanical platforms.

View Article and Find Full Text PDF

Promiscuity, or selectivity on a spectrum, is an encoded feature in biomolecular anion recognition. To unravel the molecular drivers of promiscuous anion recognition, we have employed a comprehensive approach - spanning experiment and theory - with the Staphylococcus carnosus nitrate regulatory element A (ScNreA) as a model. Thermodynamic analysis reveals that ScNreA complexation with native nitrate and nitrite or non-native iodide is an exothermic process.

View Article and Find Full Text PDF

Electronegative LDL strongly induces LRP1 release from human monocytes and macrophages.

Clin Investig Arterioscler

September 2025

Cardiovascular Biochemistry, IR SANT PAU, Barcelona, Spain; CIBER of Diabetes and Metabolic Diseases (CIBERDEM), Madrid, Spain. Electronic address:

Background: Electronegative LDL (LDL(-)) is a circulant modified LDL with inflammatory properties whose proportion raises in ischemic events. The soluble form of LDL receptor related protein 1 (sLRP1) increases in blood in pathological situations, including ischemic stroke. We aimed to evaluate the effect of LDL(-) on sLRP1 release from monocytes and macrophages.

View Article and Find Full Text PDF

Background: Tumor heterogeneity and antigen escape are mechanisms of resistance to chimeric antigen receptor (CAR)-T cell therapy, especially in solid tumors. Targeting multiple antigens with a unique CAR construct could be a strategy for a better tumor control than monospecific CAR-T cells on heterogeneous models. To overcome tumor heterogeneity, we targeted mesothelin (meso) and Mucin 16 (MUC16), two antigens commonly expressed in solid tumors, using a tandem CAR design.

View Article and Find Full Text PDF

As the primary storage protein, highland barley gliadin (HBG) exhibits limitations in the processing of highland barley foods, primarily due to its abundant non-polar amino acids. In this study, HBG was utilized to prepare sugar-HBG complexes with pentose (xylose), hexoses (glucose and galactose), and disaccharides (lactose and maltose) in an aqueous system at a pH of 11 and a temperature of 75 °C. Subsequently, the structural and functional characteristics of these complexes were evaluated.

View Article and Find Full Text PDF