Evaluation of retrieval-augmented generation and large language models in clinical guidelines for degenerative spine conditions.

Audrey Y Su , Ashley Knebel , Andrew Y Xu , Marco Kaper , Phillip Schmitt , Joseph E Nassar , Manjot Singh , Michael J Farias , Jinho Kim , Bassel G Diebo , Alan H Daniels

Eur Spine J

Department of Orthopaedic Surgery at Brown University, Providence, USA.

Published: July 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Purpose: Degenerative spinal diseases often require complex, patient-specific treatment, presenting a compelling challenge for artificial intelligence (AI) integration into clinical practice. While existing literature has focused on ChatGPT-4o performance in individual spine conditions, this study compares ChatGPT-4o, a traditional large language model (LLM), against NotebookLM, a novel retrieval-augmented model (RAG-LLM) supplemented with North American Spine Society (NASS) guidelines, for concordance with all five published NASS guidelines for degenerative spinal diseases.

Methods: A total of 118 questions from NASS guidelines regarding five degenerative spinal conditions were presented to ChatGPT-4o and NotebookLM. All responses were scored based on accuracy, evidence-based conclusions, supplementary and complete information.

Results: Overall, NotebookLM provided significantly more accurate responses (98.3% vs. 40.7%, p < 0.05), more evidence-based conclusions (99.1% vs. 40.7%, p < 0.05), and more complete information (94.1% vs. 79.7%, p < 0.05), while ChatGPT-4o provided more supplementary information (98.3% vs. 67.8%, p < 0.05). These discrepancies became most prominent in nonsurgical and surgical interventions, wherein ChatGPT often produced recommendations with unsubstantiated certainty.

Conclusion: While RAG-LLMs are a promising tool for clinical decision-making assistance and show significant improvement from prior models, physicians should remain cautious when integrating AI into patient care, especially in the context of nuanced medical scenarios.

Download full-text PDF	Source
http://dx.doi.org/10.1007/s00586-025-08994-8	DOI Listing

Publication Analysis

Top Keywords

guidelines degenerative

degenerative spinal

nass guidelines

large language

spine conditions

evaluation retrieval-augmented

retrieval-augmented generation

generation large

language models

models clinical

A PHP Error was encountered