Severity: Warning
Message: file_get_contents(https://...@gmail.com&api_key=61f08fa0b96a73de8c900d749fcb997acc09&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests
Filename: helpers/my_audit_helper.php
Line Number: 197
Backtrace:
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 197
Function: file_get_contents
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 271
Function: simplexml_load_file_from_url
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3165
Function: getPubMedXML
File: /var/www/html/application/controllers/Detail.php
Line: 597
Function: pubMedSearch_Global
File: /var/www/html/application/controllers/Detail.php
Line: 511
Function: pubMedGetRelatedKeyword
File: /var/www/html/index.php
Line: 317
Function: require_once
98%
921
2 minutes
20
Background Context: A machine learning (ML) model was recently developed to predict massive intraoperative blood loss (>2,500 mL) during posterior decompressive surgery for spinal metastasis that performed well on external validation within the same region in China.
Purpose: We sought to externally validate this model across new geographic regions (North America and Europe) and patient cohorts.
Study Design: Multiinstitutional retrospective cohort study.
Patient Sample: We retrospectively included patients 18 years or older who underwent decompressive surgery for spinal metastasis across three institutions in the United States, the United Kingdom and the Netherlands between 2016 and 2022. Inclusion and exclusion criteria were consistent with the development study with additional inclusion of (1) patients undergoing palliative decompression without stabilization, (2) patients with multiple myeloma and lymphoma, and (3) patients who continued anticoagulants perioperatively.
Outcome Measures: Model performance was assessed by comparing the incidence of massive intraoperative blood loss (>2,500 mL) in our cohort to the predicted risk generated by the ML model. Blood loss was quantified in 7 ways (including the formula from the development study) as no gold standard exists, and the method in the development paper was not clearly defined. We estimated blood loss using the anesthesia report, and calculated it using transfusion data, and preoperative and postoperative hematocrit levels.
Methods: The following five input variables necessary for risk calculation by the ML model were manually collected: tumor type, smoking status, ECOG score, surgical process, and preoperative platelet count. Model performance was assessed on overall fit (Brier score), discriminatory ability (area under the curve (AUC)), calibration (intercept & slope), and clinical utility (decision curve analysis)) for the total validation cohort, and for the North American and European cohorts separately. A subanalysis, excluding the additional included patient groups, assessed the predictive model's performance with the same inclusion and exclusion criteria as the development cohort.
Results: A total of 880 patients were included with a massive blood loss incidence ranging from 5.3% to 18% depending on the quantification method used. Using the most favorable quantification method, the predictive model overestimated risk in our total validation cohort and scored poorly on overall fit (Brier score: 0.278), discrimination (AUC: 0.631 [95%CI: 0.583, 0.680]), calibration, (intercept: -2.082, [95%CI: -2.285, -1.879]), slope: 0.283 [95%CI: 0.173, 0.393]), and clinical utility, with net harm observed in decision curve analysis from 20%. Similar poor performance results were observed in the subanalysis excluding the additional included patients (n=676) and when analyzing the North American (n=539) and European (n=341) cohorts separately.
Conclusions: To our knowledge, this is the first published external validation of a predictive ML model within orthopedic surgery to demonstrate poor performance. This poor performance might be attributed to overfitting and sampling bias as the development cohort had an insufficient sample size, and distributional shift as our cohort had key differences in predictive variables used by the model. These findings emphasize the importance of extensive validation in different geographical areas and addressing biases and pitfalls of ML model development before clinical implementation, as untested models may do more harm than good.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.spinee.2025.03.018 | DOI Listing |