Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Patients with locally advanced rectal cancer (LARC) show substantial individual variability and a pronounced imbalance in response distribution to neoadjuvant chemoradiotherapy (nCRT), posing significant challenges to treatment response prediction. This study aims to identify effective predictive biomarkers and develop an ensemble learning-based prediction model to assess the response of LARC patients to nCRT. A two-step feature selection method was developed to identify predictive biomarkers by deriving stable reversal gene pairs through within-sample relative expression orderings (REOs) from LARC patients undergoing nCRT. Preliminary screening utilized four methods-MDFS, Boruta, MCFS, and VSOLassoBag-to form a candidate feature set. Secondary screening ranked these features by permutation importance, applying Incremental Feature Selection (IFS) with an Extreme Gradient Boosting (XGBoost) to determine final predictive gene pairs. The ensemble model BoostForest, combining boosting and bagging, served as the predictive framework, with SHAP employed for interpretability. Through two-step feature selection, the 32-gene pair signature (32-GPS) was established as the final predictive biomarker. In the test set, the model achieved an area under the precision-recall curve (AUPRC) of 0.983 and an accuracy of 0.988. In the validation cohort, the AUPRC was 0.785, with an accuracy of 0.898, indicating strong model performance. The study further demonstrated that BoostForest achieved superior overall performance compared to Random Forest, Support Vector Machine (SVM), and XGBoost. To evaluate the effectiveness of the 32-GPS, its performance was compared with two alternative feature sets: the lasso-gene pair signature (lasso-GPS), derived through lasso regression, and the 15-shared gene pair signature (15-SGPS), consisting of gene pairs identified by all four feature selection methods. The 32-GPS demonstrated superior performance in both comparisons. The two-step feature selection method identified robust predictive biomarkers, and BoostForest outperformed Random Forest, Support Vector Machine, and XGBoost in classification performance and predictive capability.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11929819PMC
http://dx.doi.org/10.1038/s41598-025-94337-yDOI Listing

Publication Analysis

Top Keywords

feature selection
24
predictive biomarkers
12
two-step feature
12
gene pairs
12
pair signature
12
neoadjuvant chemoradiotherapy
8
locally advanced
8
advanced rectal
8
rectal cancer
8
feature
8

Similar Publications

To address the increasingly limited water availability, using metal-organic frameworks (MOFs) to capture atmospheric water vapor as usable resources has emerged as a promising strategy. The adsorption characteristics of MOFs as well as their step pressure (i.e.

View Article and Find Full Text PDF

Background: While osteoporosis in primary hyperparathyroidism (PHPT) is widely studied, PHPT patients with osteopenia remain less characterized. This study aimed to evaluate the prevalence, biochemical features, and estimated fracture risk of osteopenic PHPT patients in a real-life cohort.

Methods: We retrospectively analyzed a consecutive series of PHPT patients with available densitometric data at three sites.

View Article and Find Full Text PDF

Expression of long non-coding RNAs MALAT1, MEG3, and XIST in gestational diabetes mellitus: a cross-sectional study.

Acta Diabetol

September 2025

Department of Endocrinology & Metabolism, Medical College & Hospital, Kolkata, 88, College St. College Square, Kolkata, West Bengal, 700073, India.

Background And Aims: Gestational diabetes mellitus (GDM) is defined as glucose intolerance first identified during pregnancy that does not meet the criteria for overt diabetes. Its pathophysiology shares key features with type 2 diabetes mellitus (T2D), including insulin resistance and inflammation. Emerging evidence suggests that long non-coding RNAs (lncRNAs) are implicated in T2D.

View Article and Find Full Text PDF

Background: The pathophysiological changes driving incident kidney cancer remain unclear. This study aimed to identify protein biomarkers and underlying mechanisms using pre-diagnostic plasma proteomics.

Materials And Methods: Among 48,851 UK Biobank participants, 165 were diagnosed with kidney cancer, and 2,911 plasma proteins were analyzed.

View Article and Find Full Text PDF

Non-invasive prediction of invasive lung adenocarcinoma and high-risk histopathological characteristics in resectable early-stage adenocarcinoma by [18F]FDG PET/CT radiomics-based machine learning models: a prospective cohort Study.

Int J Surg

September 2025

Department of Respiratory and Critical Care Medicine, Hubei Province Clinical Research Center for Major Respiratory Diseases, Key Laboratory of Pulmonary Diseases of National Health Commission, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China

Background: Precise preoperative discrimination of invasive lung adenocarcinoma (IA) from preinvasive lesions (adenocarcinoma in situ [AIS]/minimally invasive adenocarcinoma [MIA]) and prediction of high-risk histopathological features are critical for optimizing resection strategies in early-stage lung adenocarcinoma (LUAD).

Methods: In this multicenter study, 813 LUAD patients (tumors ≤3 cm) formed the training cohort. A total of 1,709 radiomic features were extracted from the PET/CT images.

View Article and Find Full Text PDF