Quantitative Evaluation of Large Language Models to Streamline Radiology Report Impressions: A Multimodal Retrospective Analysis.

Radiology

From the Yale School of Medicine (R.D., P.K.) and Department of Radiology and Biomedical Imaging (K.S.A., S.S.B., S.C., H.P.F.), Yale School of Medicine, 333 Cedar St, New Haven, CT 06510; Yale School of Management, New Haven, Conn (H.P.F.); and Department of Health Policy and Management, Yale Schoo

Published: March 2024


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background The complex medical terminology of radiology reports may cause confusion or anxiety for patients, especially given increased access to electronic health records. Large language models (LLMs) can potentially simplify radiology report readability. Purpose To compare the performance of four publicly available LLMs (ChatGPT-3.5 and ChatGPT-4, Bard [now known as Gemini], and Bing) in producing simplified radiology report impressions. Materials and Methods In this retrospective comparative analysis of the four LLMs (accessed July 23 to July 26, 2023), the Medical Information Mart for Intensive Care (MIMIC)-IV database was used to gather 750 anonymized radiology report impressions covering a range of imaging modalities (MRI, CT, US, radiography, mammography) and anatomic regions. Three distinct prompts were employed to assess the LLMs' ability to simplify report impressions. The first prompt (prompt 1) was "Simplify this radiology report." The second prompt (prompt 2) was "I am a patient. Simplify this radiology report." The last prompt (prompt 3) was "Simplify this radiology report at the 7th grade level." Each prompt was followed by the radiology report impression and was queried once. The primary outcome was simplification as assessed by readability score. Readability was assessed using the average of four established readability indexes. The nonparametric Wilcoxon signed-rank test was applied to compare reading grade levels across LLM output. Results All four LLMs simplified radiology report impressions across all prompts tested ( < .001). Within prompts, differences were found between LLMs. Providing the context of being a patient or requesting simplification at the seventh-grade level reduced the reading grade level of output for all models and prompts (except prompt 1 to prompt 2 for ChatGPT-4) ( < .001). Conclusion Although the success of each LLM varied depending on the specific prompt wording, all four models simplified radiology report impressions across all modalities and prompts tested. © RSNA, 2024 See also the editorial by Rahsepar in this issue.

Download full-text PDF

Source
http://dx.doi.org/10.1148/radiol.231593DOI Listing

Publication Analysis

Top Keywords

radiology report
32
report impressions
24
prompt prompt
16
simplified radiology
12
radiology
11
prompt
10
report
9
large language
8
language models
8
simplify radiology
8

Similar Publications

Introduction: Medical physicists play a critical role in ensuring image quality and patient safety, but their routine evaluations are limited in scope and frequency compared to the breadth of clinical imaging practices. An electronic radiologist feedback system can augment medical physics oversight for quality improvement. This work presents a novel quality feedback system integrated into the Epic electronic medical record (EMR) at a university hospital system, designed to facilitate feedback from radiologists to medical physicists and technologist leaders.

View Article and Find Full Text PDF

Dysregulated dopaminergic signaling has been implicated in the pathophysiology of major depressive disorder (MDD) and childhood sexual abuse (CSA), but inconsistencies abound. In a multimodal PET-functional MRI study, harnessing the highly selective tracer [C]altropane, we investigated dopamine transporter availability (DAT) and resting-state functional connectivity (rsFC) within reward-related regions among 112 unmedicated individuals (MDD: n = 37, MDD/CSA: n = 18; CSA no MDD: n = 14; controls: n = 43). Striatal DAT and seed-based rsFC were assessed in the dorsal and ventral striatum and the ventral tegmental area.

View Article and Find Full Text PDF

Rationale And Objectives: To investigate the performance of CT-guided percutaneous CNB for the diagnosis of pancreatic tumors using a blunt-tip needle technique.

Materials And Methods: This is a retrospective cohort study included 103 consecutive patients (64 males and 39 females; mean age 61±12.3 years;range 21-86) who underwent CT-guided percutaneous CNB of pancreatic lesions using a blunt-tip needle technique between October 2021 and October 2023.

View Article and Find Full Text PDF

Rationale And Objectives: The diagnostic value of traditional imaging methods and radiomics in predicting macrotrabecular-massive hepatocellular carcinoma (MTM HCC) is yet to be ascertained. Therefore, this meta-analysis aims to compare the diagnostic performance of radiomics and conventional imaging techniques for MTM HCC.

Materials And Methods: Comprehensive publications were searched in PubMed, Embase, Web of Science, and Cochrane Library up to 28 February 2025.

View Article and Find Full Text PDF