Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Aims: The use of artificial intelligence (AI) in spinal surgery is expanding, yet its ability to match the diagnostic and treatment planning accuracy of human surgeons remains unclear. This study aims to compare the performance of AI models-ChatGPT-3.5, ChatGPT-4, and Google Bard-with that of experienced spinal surgeons in controversial spinal scenarios.

Methods: A questionnaire comprising 54 questions was presented to ten spinal surgeons on two occasions, four weeks apart, to assess consistency. The same questionnaire was also presented to ChatGPT-3.5, ChatGPT-4, and Google Bard, each generating five responses per question. Responses were analyzed for consistency and agreement with human surgeons using Kappa values. Thematic analysis of AI responses identified common themes and evaluated the depth and accuracy of AI recommendations.

Results: Test-retest reliability among surgeons showed Kappa values from 0.535 to 1.00, indicating moderate to perfect reliability. Inter-rater agreement between surgeons and AI models was generally low, with nonsignificant p-values. Fair agreements were observed between surgeons' second occasion responses and ChatGPT-3.5 (Kappa = 0.24) and ChatGPT-4 (Kappa = 0.27). AI responses were detailed and structured, while surgeons provided more concise answers.

Conclusions: AI large language models are not yet suitable for complex spinal surgery decisions but hold potential for preliminary information gathering and emergency triage. Legal, ethical, and accuracy issues must be addressed before AI can be reliably integrated into clinical practice.

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00586-025-08825-wDOI Listing

Publication Analysis

Top Keywords

spinal surgeons
12
spinal surgery
12
surgeons
8
controversial spinal
8
human surgeons
8
chatgpt-4 google
8
surgeons kappa
8
kappa values
8
spinal
6
responses
5

Similar Publications

Background: Minimally invasive transforaminal lumbar interbody fusion (MIS-TLIF) is widely adopted for the treatment of lumbar degenerative disease. Expandable cages are now increasingly used in MIS-TLIF to facilitate disc height restoration in narrow spaces. Despite theoretical advantages, the clinical and radiologic outcomes of expandable cage compared to static cage remain controversial.

View Article and Find Full Text PDF

Objective: Variations exist among surgeons in the treatment of recurrent lumbar disc herniation (LDH), generating major issues in decision-making models. The authors aimed to identify international nuances in surgical treatment patterns, highlight the differences in responses in each country group and different treatment trends across countries, and identify factors that influence surgical decisions.

Methods: An online survey with preformulated answers was submitted to 292 orthopedic surgeons and 223 neurosurgeons from 16 countries regarding 3 clinical vignettes (recurrence without low back pain, recurrence with severe low back pain, and recurrence with 2-level disc disease).

View Article and Find Full Text PDF

Foot drop is relatively common and can be a notable source of patient dissatisfaction and even potential litigation. For the spine surgeon evaluating such a patient, the natural inclination is to investigate a spinal etiology; however, foot drop can develop from a multitude of distinct insults along several locations, extending from the cerebral cortex to the leg musculature itself. In-depth understanding of the relevant anatomy implicated in foot drop, as well as the pathologies that may impede those structures, is paramount to expanding a surgeon's differential diagnosis.

View Article and Find Full Text PDF

Purpose: Intraoperative bleeding remains a major challenge in lumbar spine surgery, with conventional assessment methods lacking standardization. The Validated Intraoperative Bleeding Severity Scale (VIBe) is a structured five-grade tool developed to objectively assess bleeding severity across surgical fields. This study evaluated the clinical utility of VIBe in lumbar spinal fusion by comparing it with conventional bleeding metrics across various hemostatic strategies, including hypotensive anesthesia and local hemostatic agent use.

View Article and Find Full Text PDF

A (P56S) mutation in a Dutch patient with familial motor neuron disease: a case report.

Amyotroph Lateral Scler Frontotemporal Degener

September 2025

Department of Neurology, Brain Centre Rudolf Magnus, University Medical Centre Utrecht, Utrecht, The Netherlands.

The c.166C > T p.(Pro56Ser) or P56S mutation in the gene was initially identified as a cause of motor neuron disease in Brazil in a large extended pedigree comprising >1,500 individuals including more than 200 cases.

View Article and Find Full Text PDF