Machine learning and discriminant analysis model for predicting benign and malignant pulmonary nodules.

BMC Med Inform Decis Mak

The Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical University, Guangzhou, 511436, China.

Published: July 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: Pulmonary Nodules (PNs) are a trend considered as the early manifestation of lung cancer. Among them, PNs that remain stable for more than two years or whose pathological results suggest not being lung cancer are considered benign PNs (BPNs), while PNs that conform to the growth pattern of tumors or whose pathological results indicate lung cancer are considered malignant PNs (MPNs). Currently, more than 90% of PNs detected by screening tests are benign, with a false positive rate of up to 96.4%. While a range of predictive models have been developed for the identification of MPNs, there are still some challenges in distinguishing between BPNs and MPNs.

Methods: We included a total of 5197 patients for the case-control study according to the preset exclusion criteria and sample size. Among them, 4735 with BPNs and 2509 with MPNs were randomly divided into training, validation, and test sets according to a 7:1.5:1.5 ratio. Three widely applicable machine learning algorithms (Random Forests, Gradient Boosting Machine, and XGBoost) were used to screen the metrics, and then the corresponding predictive models were constructed using discriminative analysis, and the best performing model was selected as the target model. The model is internally validated with 10-fold cross validation and compared with PKUPH and Block models.

Results: We collated information from chest CT examinations performed from 2018 to 2021 in the physical examination population and found that the detection rate of PNs was 21.57% and showed an overall upward trend. The GMU_D model constructed by discriminative analysis based on machine learning screening features had an excellent discriminative performance (AUC = 0.866, 95% CI: 0.858-0.874), and higher accuracy than the PKUPH model (AUC = 0.559, 95% CI: 0.552-0.567) and the Block model (AUC = 0.823, 95% CI: 0.814-0.833). Moreover, the cross-validation results also exhibit excellent performance (AUC = 0.866, 95% CI: 0.858-0.874).

Conclusion: The detection rate of PNs was 21.57% in the physical examination population undergoing chest CT. Meanwhile, based on real-world studies of PNs, a greater prediction tool was developed and validated that can be used to accurately distinguish between BPNs and MPNs with the excellent predictive performance and differentiation.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12275455PMC
http://dx.doi.org/10.1186/s12911-025-03067-8DOI Listing

Publication Analysis

Top Keywords

machine learning
12
lung cancer
12
pns
9
pulmonary nodules
8
cancer considered
8
predictive models
8
constructed discriminative
8
discriminative analysis
8
physical examination
8
examination population
8

Similar Publications

Introduction: Vision language models (VLMs) combine image analysis capabilities with large language models (LLMs). Because of their multimodal capabilities, VLMs offer a clinical advantage over image classification models for the diagnosis of optic disc swelling by allowing a consideration of clinical context. In this study, we compare the performance of non-specialty-trained VLMs with different prompts in the classification of optic disc swelling on fundus photographs.

View Article and Find Full Text PDF

Multi-Omics and Clinical Validation Identify Key Glycolysis- and Immune-Related Genes in Sepsis.

Int J Gen Med

September 2025

Department of Geriatrics, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, 610072, People's Republic of China.

Background: Sepsis is characterized by profound immune and metabolic perturbations, with glycolysis serving as a pivotal modulator of immune responses. However, the molecular mechanisms linking glycolytic reprogramming to immune dysfunction remain poorly defined.

Methods: Transcriptomic profiles of sepsis were obtained from the Gene Expression Omnibus.

View Article and Find Full Text PDF

Accurate differentiation between persistent vegetative state (PVS) and minimally conscious state and estimation of recovery likelihood in patients in PVS are crucial. This study analyzed electroencephalography (EEG) metrics to investigate their relationship with consciousness improvements in patients in PVS and developed a machine learning prediction model. We retrospectively evaluated 19 patients in PVS, categorizing them into two groups: those with improved consciousness ( = 7) and those without improvement ( = 12).

View Article and Find Full Text PDF

Artificial intelligence (AI) is a technique or tool to simulate or emulate human "intelligence." Precision medicine or precision histology refers to the subpopulation-tailored diagnosis, therapeutics, and management of diseases with its sociocultural, behavioral, genomic, transcriptomic, and pharmaco-omic implications. The modern decade experiences a quantum leap in AI-based models in various aspects of daily routines including practice of precision medicine and histology.

View Article and Find Full Text PDF

Introduction: Spinal cord injury (SCI) presents a significant burden to patients, families, and the healthcare system. The ability to accurately predict functional outcomes for SCI patients is essential for optimizing rehabilitation strategies, guiding patient and family decision making, and improving patient care.

Methods: We conducted a retrospective analysis of 589 SCI patients admitted to a single acute rehabilitation facility and used the dataset to train advanced machine learning algorithms to predict patients' rehabilitation outcomes.

View Article and Find Full Text PDF