Information Extraction from Lumbar Spine MRI Radiology Reports Using GPT4: Accuracy and Benchmarking Against Research-Grade Comprehensive Scoring.

Katharina Ziegeler , Virginie Kreutzinger , Michelle W Tong , Cynthia T Chin , Emma Bahroos , Po-Hung Wu , Noah Bonnheim , Aaron J Fields , Jeffrey C Lotz , Thomas M Link , Sharmila Majumdar

Diagnostics (Basel)

Department of Radiology and Biomedical Imaging, University of California San Francisco, San Francisco, CA 94143, USA.

Published: April 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

: This study aimed to create a pipeline for standardized data extraction from lumbar-spine MRI radiology reports using a large language model (LLM) and assess the agreement of the extracted data with research-grade semi-quantitative scoring. : We included a subset of data from a multi-site NIH-funded cohort study of chronic low back pain (cLBP) participants. After initial prompt development, a secure application programming interface (API) deployment of OpenAIs GPT-4 was used to extract different classes of pathology from the clinical radiology report. Unsupervised UMAP and agglomerative clustering of the pathology terms' embeddings provided insight into model comprehension for optimized prompt design. Model extraction was benchmarked against human extraction (gold standard) with F1 scores and false-positive and false-negative rates (FPR/FNR). Then, an expert MSK radiologist provided comprehensive research-grade scores of the images, and agreement with report-extracted data was calculated using Cohen's kappa. : Data from 230 patients with cLBP were included (mean age 53.2 years, 54% women). The overall model performance for extracting data from clinical reports was excellent, with a mean F1 score of 0.96 across pathologies. The mean FPR was marginally higher than the FNR (5.1% vs. 3.0%). Agreement with comprehensive scoring was moderate (kappa 0.424), and the underreporting of lateral recess stenosis (FNR 63.6%) and overreporting of disc pathology (FPR 42.7%) were noted. : LLMs can accurately extract highly detailed information on lumbar spine imaging pathologies from radiology reports. Moderate agreement between the LLM and comprehensive scores underscores the need for less subjective, machine-based data extraction from imaging.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11989208	PMC
http://dx.doi.org/10.3390/diagnostics15070930	DOI Listing

Publication Analysis

Top Keywords

radiology reports

lumbar spine

mri radiology

comprehensive scoring

data extraction

data

extraction

extraction lumbar

spine mri

radiology

A PHP Error was encountered