Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: Disease classification using 16S rRNA microbiome data faces challenges of high dimensionality, compositionality, and sparsity, compounded by the inherent small sample sizes in many studies. Machine learning and feature selection techniques offer potential to identify robust biomarkers and improve classification performance; however, their comparative effectiveness across diverse methods and datasets has been insufficiently explored. This study evaluates multiple feature selection techniques alongside normalization strategies, focusing on their interplay with classifier performance.

Results: Our analyses revealed that centered log-ratio normalization improves the performance of logistic regression and support vector machine models and facilitates feature selection, whereas random forest models yield strong results using relative abundances. Interestingly, presence-absence normalization was able to achieve similar performance compared to abundance-based transformations across classifiers. Among feature selection methods, minimum redundancy maximum relevancy (mRMR) surpassed most methods in identifying compact feature sets and demonstrated performance comparable to least absolute shrinkage and selection operator (LASSO), which obtained top results requiring lower computation times. Autoencoders needed larger latent spaces to perform well and lacked interpretability, Mutual Information suffered from redundancy, and ReliefF struggled with data sparsity.

Conclusions: Overall, feature selection pipelines improved model focus and robustness via a massive reduction of the feature space. mRMR and LASSO emerged as the most effective methods across datasets.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12402773PMC
http://dx.doi.org/10.1093/gigascience/giaf096DOI Listing

Publication Analysis

Top Keywords

feature selection
24
feature
8
disease classification
8
selection techniques
8
methods datasets
8
selection
7
exploring role
4
normalization
4
role normalization
4
normalization feature
4

Similar Publications

Background: Immune checkpoint inhibitors (ICIs) play a pivotal role in the treatment of advanced gastric cancer (GC). However, the biomarkers used to predict ICI efficacy are limited due to their reliance on single or static tumor characteristics. This study aims to develop a machine learning (ML) model that incorporates dynamic changes in clinlabomics data to optimize the predictive accuracy of ICI efficacy.

View Article and Find Full Text PDF

Background: Endometriosis symptoms have multifaceted manifestations, and there are few approved nonsurgical treatment options. Gonadotropin-releasing hormone (GnRH) agonists/antagonists for endometriosis vary on efficacy, safety profile, and out-of-pocket (OOP) cost, among other features.

Objectives: This study quantified the importance that women with endometriosis in the United States (US) placed on pain and non-pain features that differ among these medications.

View Article and Find Full Text PDF

The aluminum electrolysis industry generates massive greenhouse gas emissions dominated by CO and perfluorocarbons (PFCs, CF/CF), presenting dual challenges of climate impact and resource waste. Here, we report a robust nickel-based metal-organic framework (SIFSIX-3-Ni) featuring confined square channels (3.55 Å) that achieves the molecular-sieving separation of CO from CF/CF mixtures.

View Article and Find Full Text PDF

Background: Acute ischemic stroke (AIS) is characterized by high incidence, sudden onset, and often poor prognosis. Carotid atherosclerosis plays a crucial role in its pathogenesis, and ultrasound imaging offers a non-invasive method for evaluating carotid plaque characteristics. This study aimed to develop and validate a prediction model for AIS risk based on a novel ultrasound-based carotid plaque scoring system combined with clinical risk factors.

View Article and Find Full Text PDF

Background And Objective: The effect of family history (FH) on prostate cancer active surveillance outcomes is unknown. Our objective is to evaluate FH of prostate, breast, ovarian, and/or pancreatic cancer in a large prospective active surveillance cohort.

Methods: Patients with recorded FH data (N = 1421) were selected.

View Article and Find Full Text PDF