Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Motivation: Multi-label (ML) protein subcellular localization (SCL) is an indispensable way to study protein function. It can locate a certain protein (such as the human transmembrane protein that promotes the invasion of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)) or expression product at a specific location in a cell, which can provide a reference for clinical treatment of diseases such as coronavirus disease 2019 (COVID-19).

Results: The article proposes a novel method named ML-locMLFE. First of all, six feature extraction methods are adopted to obtain protein effective information. These methods include pseudo amino acid composition, encoding based on grouped weight, gene ontology, multi-scale continuous and discontinuous, residue probing transformation and evolutionary distance transformation. In the next part, we utilize the ML information latent semantic index method to avoid the interference of redundant information. In the end, ML learning with feature-induced labeling information enrichment is adopted to predict the ML protein SCL. The Gram-positive bacteria dataset is chosen as a training set, while the Gram-negative bacteria dataset, virus dataset, newPlant dataset and SARS-CoV-2 dataset as the test sets. The overall actual accuracy of the first four datasets are 99.23%, 93.82%, 93.24% and 96.72% by the leave-one-out cross validation. It is worth mentioning that the overall actual accuracy prediction result of our predictor on the SARS-CoV-2 dataset is 72.73%. The results indicate that the ML-locMLFE method has obvious advantages in predicting the SCL of ML protein, which provides new ideas for further research on the SCL of ML protein.

Availability And Implementation: The source codes and datasets are publicly available at https://github.com/QUST-AIBBDRC/ML-locMLFE/.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8690230PMC
http://dx.doi.org/10.1093/bioinformatics/btab811DOI Listing

Publication Analysis

Top Keywords

protein
8
multi-label protein
8
protein subcellular
8
subcellular localization
8
bacteria dataset
8
sars-cov-2 dataset
8
actual accuracy
8
dataset
6
predicting multi-label
4
localization multi-information
4

Similar Publications

Background: Alzheimer's disease (AD) patients and animal models exhibit an altered gut microbiome that is associated with pathological changes in the brain. Intestinal miRNA enters bacteria and regulates bacterial metabolism and proliferation. This study aimed to investigate whether the manipulation of miRNA could alter the gut microbiome and AD pathologies.

View Article and Find Full Text PDF

Nuclear receptors (NRs) are a superfamily of ligand-activated transcription factors that regulate gene expression in response to metabolic, hormonal, and environmental signals. These receptors play a critical role in metabolic homeostasis, inflammation, immune function, and disease pathogenesis, positioning them as key therapeutic targets. This review explores the mechanistic roles of NRs such as PPARs, FXR, LXR, and thyroid hormone receptors (THRs) in regulating lipid and glucose metabolism, energy expenditure, cardiovascular health, and neurodegeneration.

View Article and Find Full Text PDF

Background: Most RNA-seq datasets harbor genes with extreme expression levels in some samples. Such extreme outliers are usually treated as technical errors and are removed from the data before further statistical analysis. Here we focus on the patterns of such outlier gene expression to investigate whether they provide insights into the underlying biology.

View Article and Find Full Text PDF

Background: Escherichia coli ST131 and clade H30Rx are the most prevalent extended-spectrum β-lactamase-producing E. coli (ESBL-EC) causing bacteremia and urinary tract infections globally and in Sweden. Previous studies have linked ST131-H30Rx with septic shock and mortality, as well as prolonged carriage.

View Article and Find Full Text PDF

The MetaboHealth score is an indicator of physiological frailty in middle aged and older individuals. The aim of the current study was to explore which molecular pathways co-vary with the MetaboHealth score. Using a Luminex cytokine assay and liquid chromatography-mass spectrometry-based proteomics we explored the plasma proteins associating with the difference in 100 extreme scoring individuals selected from two large population cohorts, the Leiden Longevity Study (LLS) and the Rotterdam Study (RS), and discordant monozygotic twin pairs from the Netherlands Twin Register (NTR).

View Article and Find Full Text PDF