98%
921
2 minutes
20
Aims: The use of artificial intelligence (AI) in spinal surgery is expanding, yet its ability to match the diagnostic and treatment planning accuracy of human surgeons remains unclear. This study aims to compare the performance of AI models-ChatGPT-3.5, ChatGPT-4, and Google Bard-with that of experienced spinal surgeons in controversial spinal scenarios.
Methods: A questionnaire comprising 54 questions was presented to ten spinal surgeons on two occasions, four weeks apart, to assess consistency. The same questionnaire was also presented to ChatGPT-3.5, ChatGPT-4, and Google Bard, each generating five responses per question. Responses were analyzed for consistency and agreement with human surgeons using Kappa values. Thematic analysis of AI responses identified common themes and evaluated the depth and accuracy of AI recommendations.
Results: Test-retest reliability among surgeons showed Kappa values from 0.535 to 1.00, indicating moderate to perfect reliability. Inter-rater agreement between surgeons and AI models was generally low, with nonsignificant p-values. Fair agreements were observed between surgeons' second occasion responses and ChatGPT-3.5 (Kappa = 0.24) and ChatGPT-4 (Kappa = 0.27). AI responses were detailed and structured, while surgeons provided more concise answers.
Conclusions: AI large language models are not yet suitable for complex spinal surgery decisions but hold potential for preliminary information gathering and emergency triage. Legal, ethical, and accuracy issues must be addressed before AI can be reliably integrated into clinical practice.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1007/s00586-025-08825-w | DOI Listing |
World Neurosurg
September 2025
Department of Neurosurgery, The Spine and Spinal Cord Institute, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, Korea. Electronic address:
Background: Minimally invasive transforaminal lumbar interbody fusion (MIS-TLIF) is widely adopted for the treatment of lumbar degenerative disease. Expandable cages are now increasingly used in MIS-TLIF to facilitate disc height restoration in narrow spaces. Despite theoretical advantages, the clinical and radiologic outcomes of expandable cage compared to static cage remain controversial.
View Article and Find Full Text PDFJ Neurosurg Spine
September 2025
22Department of Neurosurgery, Medstar Georgetown University Hospital, Washington, DC.
Objective: Variations exist among surgeons in the treatment of recurrent lumbar disc herniation (LDH), generating major issues in decision-making models. The authors aimed to identify international nuances in surgical treatment patterns, highlight the differences in responses in each country group and different treatment trends across countries, and identify factors that influence surgical decisions.
Methods: An online survey with preformulated answers was submitted to 292 orthopedic surgeons and 223 neurosurgeons from 16 countries regarding 3 clinical vignettes (recurrence without low back pain, recurrence with severe low back pain, and recurrence with 2-level disc disease).
J Am Acad Orthop Surg
December 2024
From the Department of Orthopaedic Surgery, Rothman Orthopaedic Institute, Thomas Jefferson University Hospital, Philadelphia, PA.
Foot drop is relatively common and can be a notable source of patient dissatisfaction and even potential litigation. For the spine surgeon evaluating such a patient, the natural inclination is to investigate a spinal etiology; however, foot drop can develop from a multitude of distinct insults along several locations, extending from the cerebral cortex to the leg musculature itself. In-depth understanding of the relevant anatomy implicated in foot drop, as well as the pathologies that may impede those structures, is paramount to expanding a surgeon's differential diagnosis.
View Article and Find Full Text PDFEur Spine J
September 2025
Department of Orthopedic Surgery, Yonsei University College of Medicine, Seoul, Republic of Korea.
Purpose: Intraoperative bleeding remains a major challenge in lumbar spine surgery, with conventional assessment methods lacking standardization. The Validated Intraoperative Bleeding Severity Scale (VIBe) is a structured five-grade tool developed to objectively assess bleeding severity across surgical fields. This study evaluated the clinical utility of VIBe in lumbar spinal fusion by comparing it with conventional bleeding metrics across various hemostatic strategies, including hypotensive anesthesia and local hemostatic agent use.
View Article and Find Full Text PDFAmyotroph Lateral Scler Frontotemporal Degener
September 2025
Department of Neurology, Brain Centre Rudolf Magnus, University Medical Centre Utrecht, Utrecht, The Netherlands.
The c.166C > T p.(Pro56Ser) or P56S mutation in the gene was initially identified as a cause of motor neuron disease in Brazil in a large extended pedigree comprising >1,500 individuals including more than 200 cases.
View Article and Find Full Text PDF