AI versus the spinal surgeons in the management of controversial spinal surgery scenarios.

Saylan Mehmet , Mohamed Nabil Elmarawany , Ian Harding , Andrew James Bowey , John Andrews , Daniel Chan , Raveen Jayasuriya , Shreya Srinivas , James Tomlinson , Edward Bayley , Michael Paul Grevitt , Stuart James , Alwyn Jones , Michael J H McCarthy

Eur Spine J

University Hospital of Wales, Cardiff, UK.

Published: April 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Aims: The use of artificial intelligence (AI) in spinal surgery is expanding, yet its ability to match the diagnostic and treatment planning accuracy of human surgeons remains unclear. This study aims to compare the performance of AI models-ChatGPT-3.5, ChatGPT-4, and Google Bard-with that of experienced spinal surgeons in controversial spinal scenarios.

Methods: A questionnaire comprising 54 questions was presented to ten spinal surgeons on two occasions, four weeks apart, to assess consistency. The same questionnaire was also presented to ChatGPT-3.5, ChatGPT-4, and Google Bard, each generating five responses per question. Responses were analyzed for consistency and agreement with human surgeons using Kappa values. Thematic analysis of AI responses identified common themes and evaluated the depth and accuracy of AI recommendations.

Results: Test-retest reliability among surgeons showed Kappa values from 0.535 to 1.00, indicating moderate to perfect reliability. Inter-rater agreement between surgeons and AI models was generally low, with nonsignificant p-values. Fair agreements were observed between surgeons' second occasion responses and ChatGPT-3.5 (Kappa = 0.24) and ChatGPT-4 (Kappa = 0.27). AI responses were detailed and structured, while surgeons provided more concise answers.

Conclusions: AI large language models are not yet suitable for complex spinal surgery decisions but hold potential for preliminary information gathering and emergency triage. Legal, ethical, and accuracy issues must be addressed before AI can be reliably integrated into clinical practice.

Download full-text PDF	Source
http://dx.doi.org/10.1007/s00586-025-08825-w	DOI Listing

Publication Analysis

Top Keywords

spinal surgeons

spinal surgery

surgeons

controversial spinal

human surgeons

chatgpt-4 google

surgeons kappa

kappa values

spinal

responses

Similar Publications

Expandable Cage in Minimally Invasive Transforaminal Lumbar Interbody Fusion: Comparative Data with Static Cage from Single Institution, Single Surgeon.

World Neurosurg

September 2025

Department of Neurosurgery, The Spine and Spinal Cord Institute, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, Korea. Electronic address:

Dongkyu Kim , Hyun Jun Jang , Bong Ju Moon , Kyung Hyun Kim , Sung Uk Kuh

Background: Minimally invasive transforaminal lumbar interbody fusion (MIS-TLIF) is widely adopted for the treatment of lumbar degenerative disease. Expandable cages are now increasingly used in MIS-TLIF to facilitate disc height restoration in narrow spaces. Despite theoretical advantages, the clinical and radiologic outcomes of expandable cage compared to static cage remain controversial.

View Article and Find Full Text PDF

Similar Publications

To fuse or not to fuse: surgical strategies for recurrent lumbar disc herniation from a 16-nation study.

J Neurosurg Spine

September 2025

22Department of Neurosurgery, Medstar Georgetown University Hospital, Washington, DC.

Bertrand Debono , Guillaume Lonjon , Luis Alvarez-Galovich , Junseok Bae , Thami Benzakour

Objective: Variations exist among surgeons in the treatment of recurrent lumbar disc herniation (LDH), generating major issues in decision-making models. The authors aimed to identify international nuances in surgical treatment patterns, highlight the differences in responses in each country group and different treatment trends across countries, and identify factors that influence surgical decisions.

Methods: An online survey with preformulated answers was submitted to 292 orthopedic surgeons and 223 neurosurgeons from 16 countries regarding 3 clinical vignettes (recurrence without low back pain, recurrence with severe low back pain, and recurrence with 2-level disc disease).

View Article and Find Full Text PDF

Similar Publications

Examining the Anatomy, Pathophysiology, and Clinical Presentation of Lower Extremity Neurologic Deficits: A Spine Surgeon's Guide to Foot Drop.

J Am Acad Orthop Surg

December 2024

From the Department of Orthopaedic Surgery, Rothman Orthopaedic Institute, Thomas Jefferson University Hospital, Philadelphia, PA.

Peter Swiatek , Alec M Giakas , Rajkishen Narayanan , Jonathan Dalton , Alexander R Vaccaro

Foot drop is relatively common and can be a notable source of patient dissatisfaction and even potential litigation. For the spine surgeon evaluating such a patient, the natural inclination is to investigate a spinal etiology; however, foot drop can develop from a multitude of distinct insults along several locations, extending from the cerebral cortex to the leg musculature itself. In-depth understanding of the relevant anatomy implicated in foot drop, as well as the pathologies that may impede those structures, is paramount to expanding a surgeon's differential diagnosis.

View Article and Find Full Text PDF

Similar Publications

Validated intraoperative bleeding severity scale (VIBe) for hemostasis assessment in lumbar spinal fusion: a prospective, randomized controlled trial.

Eur Spine J

September 2025

Department of Orthopedic Surgery, Yonsei University College of Medicine, Seoul, Republic of Korea.

Namhoo Kim , Sub-Ri Park , Jae Won Shin , Ji-Won Kwon , Si-Young Park

Purpose: Intraoperative bleeding remains a major challenge in lumbar spine surgery, with conventional assessment methods lacking standardization. The Validated Intraoperative Bleeding Severity Scale (VIBe) is a structured five-grade tool developed to objectively assess bleeding severity across surgical fields. This study evaluated the clinical utility of VIBe in lumbar spinal fusion by comparing it with conventional bleeding metrics across various hemostatic strategies, including hypotensive anesthesia and local hemostatic agent use.

View Article and Find Full Text PDF

Similar Publications

A (P56S) mutation in a Dutch patient with familial motor neuron disease: a case report.

Amyotroph Lateral Scler Frontotemporal Degener

September 2025

Department of Neurology, Brain Centre Rudolf Magnus, University Medical Centre Utrecht, Utrecht, The Netherlands.

Sean W Willemse , Koen C Demaegd , Ruben P A Van Eijk , Philippe Van Damme , Elizabeth Harrington

The c.166C > T p.(Pro56Ser) or P56S mutation in the gene was initially identified as a cause of motor neuron disease in Brazil in a large extended pedigree comprising >1,500 individuals including more than 200 cases.

View Article and Find Full Text PDF

Similar Publications