External validation of a machine learning prediction model for massive blood loss during surgery for spinal metastases: a multi-institutional study using 880 patients.

Daniël C de Reus , René Harmen Kuijten , Priyanshu Saha , Diego A Abelleyra Lastoria , Aliénor Warr-Esser , Charles F C Taylor , Olivier Q Groot , Darren Lui , Jorrit-Jan Verlaan , Daniel G Tobert

Spine J

Department of Orthopedic Surgery, Massachusetts General Hospital - Harvard Medical School, 55 Fruit St, Room 3.932, Yawkey building, Boston, MA 02114, USA.

Published: July 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Background Context: A machine learning (ML) model was recently developed to predict massive intraoperative blood loss (>2,500 mL) during posterior decompressive surgery for spinal metastasis that performed well on external validation within the same region in China.

Purpose: We sought to externally validate this model across new geographic regions (North America and Europe) and patient cohorts.

Study Design: Multiinstitutional retrospective cohort study.

Patient Sample: We retrospectively included patients 18 years or older who underwent decompressive surgery for spinal metastasis across three institutions in the United States, the United Kingdom and the Netherlands between 2016 and 2022. Inclusion and exclusion criteria were consistent with the development study with additional inclusion of (1) patients undergoing palliative decompression without stabilization, (2) patients with multiple myeloma and lymphoma, and (3) patients who continued anticoagulants perioperatively.

Outcome Measures: Model performance was assessed by comparing the incidence of massive intraoperative blood loss (>2,500 mL) in our cohort to the predicted risk generated by the ML model. Blood loss was quantified in 7 ways (including the formula from the development study) as no gold standard exists, and the method in the development paper was not clearly defined. We estimated blood loss using the anesthesia report, and calculated it using transfusion data, and preoperative and postoperative hematocrit levels.

Methods: The following five input variables necessary for risk calculation by the ML model were manually collected: tumor type, smoking status, ECOG score, surgical process, and preoperative platelet count. Model performance was assessed on overall fit (Brier score), discriminatory ability (area under the curve (AUC)), calibration (intercept & slope), and clinical utility (decision curve analysis)) for the total validation cohort, and for the North American and European cohorts separately. A subanalysis, excluding the additional included patient groups, assessed the predictive model's performance with the same inclusion and exclusion criteria as the development cohort.

Results: A total of 880 patients were included with a massive blood loss incidence ranging from 5.3% to 18% depending on the quantification method used. Using the most favorable quantification method, the predictive model overestimated risk in our total validation cohort and scored poorly on overall fit (Brier score: 0.278), discrimination (AUC: 0.631 [95%CI: 0.583, 0.680]), calibration, (intercept: -2.082, [95%CI: -2.285, -1.879]), slope: 0.283 [95%CI: 0.173, 0.393]), and clinical utility, with net harm observed in decision curve analysis from 20%. Similar poor performance results were observed in the subanalysis excluding the additional included patients (n=676) and when analyzing the North American (n=539) and European (n=341) cohorts separately.

Conclusions: To our knowledge, this is the first published external validation of a predictive ML model within orthopedic surgery to demonstrate poor performance. This poor performance might be attributed to overfitting and sampling bias as the development cohort had an insufficient sample size, and distributional shift as our cohort had key differences in predictive variables used by the model. These findings emphasize the importance of extensive validation in different geographical areas and addressing biases and pitfalls of ML model development before clinical implementation, as untested models may do more harm than good.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.spinee.2025.03.018	DOI Listing

Publication Analysis

Top Keywords

blood loss

external validation

surgery spinal

poor performance

model

machine learning

massive blood

880 patients

massive intraoperative

intraoperative blood

A PHP Error was encountered