Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Cancer remains one of the leading causes of mortality worldwide, where early detection significantly improves patient outcomes and reduces treatment burden. This study investigates the application of Machine Learning (ML) techniques to predict cancer risk based on a combination of genetic and lifestyle factors. A structured dataset of 1,200 patient records was used, comprising features such as age, gender, Body Mass Index (BMI), smoking status, alcohol intake, physical activity, genetic risk level, and personal history of cancer. A full end-to-end ML pipeline was implemented, encompassing data exploration, preprocessing, feature scaling, model training, and evaluation using stratified cross-validation and a separate test set. Nine supervised learning algorithms were evaluated and compared, including Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Support Vector Machines (SVMs), and several ensemble methods. Among these, Categorical Boosting (CatBoost) achieved the highest predictive performance, with a test accuracy of 98.75% and an F1-score of 0.9820, outperforming both traditional and other advanced models. Feature importance analysis confirmed the strong influence of cancer history, genetic risk, and smoking status on prediction outcomes. The findings highlight the effectiveness of boosting-based ensemble models in capturing complex interactions within health data and support their potential use in personalized cancer risk assessment. This research underscores the value of integrating genetic and modifiable lifestyle variables into predictive modeling to enhance early detection and preventive healthcare strategies.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12365227PMC
http://dx.doi.org/10.1038/s41598-025-15656-8DOI Listing

Publication Analysis

Top Keywords

cancer risk
12
machine learning
8
early detection
8
smoking status
8
genetic risk
8
risk
5
genetic
5
cancer
5
predicting cancer
4
risk machine
4

Similar Publications

Importance: Increasingly, strategies to systematically detect melanomas invoke targeted approaches, whereby those at highest risk are prioritized for skin screening. Many tools exist to predict future melanoma risk, but most have limited accuracy and are potentially biased.

Objectives: To develop an improved melanoma risk prediction tool for invasive melanoma.

View Article and Find Full Text PDF

Importance: Janus kinase (JAK) inhibitors are highly effective medications for several immune-mediated inflammatory diseases (IMIDs). However, safety concerns have led to regulatory restrictions.

Objective: To compare the risk of adverse events with JAK inhibitors vs tumor necrosis factor (TNF) antagonists in patients with IMIDs in head-to-head comparative effectiveness studies.

View Article and Find Full Text PDF

Importance: Merkel cell carcinoma (MCC) is typically caused by the Merkel cell polyomavirus (MCPyV) and recurs in 40% of patients. Half of patients with MCC produce antibodies to MCPyV oncoproteins, the titers of which rise with disease recurrence and fall after successful treatment.

Objective: To assess the utility of MCPyV oncoprotein antibodies for early detection of first recurrence of MCC in a real-world clinical setting.

View Article and Find Full Text PDF