Machine learning and discriminant analysis model for predicting benign and malignant pulmonary nodules.

Zhi Li , Wenjing Zhang , Jinyi Huang , Ling Lu , Dongming Xie , Jinrong Zhang , Jiamin Liang , Yuepeng Sui , Linyuan Liu , Jianjun Zou , Ao Lin , Lei Yang , Fuman Qiu , Zhaoting Hu , Mei Wu , Yibin Deng , Xin Zhang , Jiachun Lu

BMC Med Inform Decis Mak

The Key Laboratory of Advanced Interdisciplinary Studies, The First Affiliated Hospital of Guangzhou Medical University, The Institute for Chemical Carcinogenesis, School of Public Health, Guangzhou Medical University, Guangzhou, 511436, China.

Published: July 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Background: Pulmonary Nodules (PNs) are a trend considered as the early manifestation of lung cancer. Among them, PNs that remain stable for more than two years or whose pathological results suggest not being lung cancer are considered benign PNs (BPNs), while PNs that conform to the growth pattern of tumors or whose pathological results indicate lung cancer are considered malignant PNs (MPNs). Currently, more than 90% of PNs detected by screening tests are benign, with a false positive rate of up to 96.4%. While a range of predictive models have been developed for the identification of MPNs, there are still some challenges in distinguishing between BPNs and MPNs.

Methods: We included a total of 5197 patients for the case-control study according to the preset exclusion criteria and sample size. Among them, 4735 with BPNs and 2509 with MPNs were randomly divided into training, validation, and test sets according to a 7:1.5:1.5 ratio. Three widely applicable machine learning algorithms (Random Forests, Gradient Boosting Machine, and XGBoost) were used to screen the metrics, and then the corresponding predictive models were constructed using discriminative analysis, and the best performing model was selected as the target model. The model is internally validated with 10-fold cross validation and compared with PKUPH and Block models.

Results: We collated information from chest CT examinations performed from 2018 to 2021 in the physical examination population and found that the detection rate of PNs was 21.57% and showed an overall upward trend. The GMU_D model constructed by discriminative analysis based on machine learning screening features had an excellent discriminative performance (AUC = 0.866, 95% CI: 0.858-0.874), and higher accuracy than the PKUPH model (AUC = 0.559, 95% CI: 0.552-0.567) and the Block model (AUC = 0.823, 95% CI: 0.814-0.833). Moreover, the cross-validation results also exhibit excellent performance (AUC = 0.866, 95% CI: 0.858-0.874).

Conclusion: The detection rate of PNs was 21.57% in the physical examination population undergoing chest CT. Meanwhile, based on real-world studies of PNs, a greater prediction tool was developed and validated that can be used to accurately distinguish between BPNs and MPNs with the excellent predictive performance and differentiation.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12275455	PMC
http://dx.doi.org/10.1186/s12911-025-03067-8	DOI Listing

Publication Analysis

Top Keywords

machine learning

lung cancer

pns

pulmonary nodules

cancer considered

predictive models

constructed discriminative

discriminative analysis

physical examination

examination population

Similar Publications

Performance of vision language models for optic disc swelling identification on fundus photographs.

Front Digit Health

August 2025

Department of Ophthalmology, Stanford University, Palo Alto, CA, United States.

Kelvin Zhenghao Li , Tuyet Thao Nguyen , Heather E Moss

Introduction: Vision language models (VLMs) combine image analysis capabilities with large language models (LLMs). Because of their multimodal capabilities, VLMs offer a clinical advantage over image classification models for the diagnosis of optic disc swelling by allowing a consideration of clinical context. In this study, we compare the performance of non-specialty-trained VLMs with different prompts in the classification of optic disc swelling on fundus photographs.

View Article and Find Full Text PDF

Similar Publications

Multi-Omics and Clinical Validation Identify Key Glycolysis- and Immune-Related Genes in Sepsis.

Int J Gen Med

September 2025

Department of Geriatrics, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, 610072, People's Republic of China.

Hengjian Du , Xin Dai , Ting Zhang , Zhao Zhang , XiaoTao Xu

Background: Sepsis is characterized by profound immune and metabolic perturbations, with glycolysis serving as a pivotal modulator of immune responses. However, the molecular mechanisms linking glycolytic reprogramming to immune dysfunction remain poorly defined.

Methods: Transcriptomic profiles of sepsis were obtained from the Gene Expression Omnibus.

View Article and Find Full Text PDF

Similar Publications

Identifying Features of Electroencephalography Associated with Improved Awareness in Persistent Vegetative State via Multiscale Entropy: A Machine Learning Modeling Study.

Neurotrauma Rep

August 2025

Institute of Acupuncture and Moxibustion, China Academy of Chinese Medical Sciences, Beijing, China.

Keyun Lai , Xiao Chen , Liyun He , Qi Liu , Changsheng Lai

Accurate differentiation between persistent vegetative state (PVS) and minimally conscious state and estimation of recovery likelihood in patients in PVS are crucial. This study analyzed electroencephalography (EEG) metrics to investigate their relationship with consciousness improvements in patients in PVS and developed a machine learning prediction model. We retrospectively evaluated 19 patients in PVS, categorizing them into two groups: those with improved consciousness ( = 7) and those without improvement ( = 12).

View Article and Find Full Text PDF

Similar Publications

Artificial Intelligence in Liver Pathology: Precision Histology for Accurate Diagnoses.

J Clin Exp Hepatol

August 2025

Dept of Histopathology, PGIMER, Chandigarh, 160012, India.

Parikshit Sanyal , Dipanwita Biswas , Suvradeep Mitra

Artificial intelligence (AI) is a technique or tool to simulate or emulate human "intelligence." Precision medicine or precision histology refers to the subpopulation-tailored diagnosis, therapeutics, and management of diseases with its sociocultural, behavioral, genomic, transcriptomic, and pharmaco-omic implications. The modern decade experiences a quantum leap in AI-based models in various aspects of daily routines including practice of precision medicine and histology.

View Article and Find Full Text PDF

Similar Publications

Machine learning predicts improvement of functional outcomes in spinal cord injury patients after inpatient rehabilitation.

Front Rehabil Sci

August 2025

Department of Neurosurgery, David Geffen School of Medicine, University of California, Los Angeles, CA, United States.

Mohammad Rasoolinejad , Irene Say , Peter B Wu , Xinran Liu , Yan Zhou

Introduction: Spinal cord injury (SCI) presents a significant burden to patients, families, and the healthcare system. The ability to accurately predict functional outcomes for SCI patients is essential for optimizing rehabilitation strategies, guiding patient and family decision making, and improving patient care.

Methods: We conducted a retrospective analysis of 589 SCI patients admitted to a single acute rehabilitation facility and used the dataset to train advanced machine learning algorithms to predict patients' rehabilitation outcomes.

View Article and Find Full Text PDF

Similar Publications