Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

The increasing complexity and volume of radiology reports present challenges for timely critical findings communication. To evaluate the performance of two out-of-the-box LLMs in detecting and classifying critical findings in radiology reports using various prompt strategies. The analysis included 252 radiology reports of varying modalities and anatomic regions extracted from the MIMIC-III database, divided into a prompt engineering tuning set of 50 reports, a holdout test set of 125 reports, and a pool of 77 remaining reports used as examples for few-shot prompting. An external test set of 180 chest radiography reports was extracted from the CheXpert Plus database. Reports were manually reviewed to identify critical findings and classify such findings into one of three categories (true critical finding, known/expected critical finding, equivocal critical finding). Following prompt engineering using various prompt strategies, a final prompt for optimal true critical findings detection was selected. Two general-purpose LLMs, GPT-4 and Mistral-7B, processed reports in the test sets using the final prompt. Evaluation included automated text similarity metrics (BLEU-1, ROUGE-F1, G-Eval) and manual performance metrics (precision, recall). For true critical findings, zero-shot, few-shot static (five examples), and few-shot dynamic (five examples) prompting yielded BLEU-1 of 0.691, 0.778, and 0.748; ROUGE-F1 of 0.706, 0.797, and 0.773; and G-Eval of 0.428, 0.573, and 0.516. Precision and recall for true critical findings, known/expected critical findings, and equivocal critical findings, in the holdout test set for GPT-4 were 90.1% and 86.9%, 80.9% and 85.0%, and 80.5% and 94.3%; in the holdout test set for Mistral-7B were 75.6% and 77.4%, 34.1% and 70.0%, and 41.3% and 74.3%; in the external test set for GPT-4 were 82.6% and 98.3%, 76.9% and 71.4%, and 70.8% and 85.0%; and in the external test set for Mistral-7B were 75.0% and 93.1%, 33.3% and 92.9%, and 34.0% and 80.0%. Out-of-the-box LLMs were used to detect and classify arbitrary numbers of critical findings in radiology reports. The optimal model for true critical findings entailed a few-shot static approach. The study shows a role of contemporary general-purpose models in adapting to specialized medical tasks using minimal data annotation.

Download full-text PDF

Source
http://dx.doi.org/10.2214/AJR.25.33469DOI Listing

Publication Analysis

Top Keywords

critical findings
44
test set
24
radiology reports
20
true critical
20
critical
14
findings
12
findings radiology
12
prompt strategies
12
holdout test
12
external test
12

Similar Publications

Background: Optimal oral care is essential in preventing non-ventilator hospital-associated pneumonia and enhancing patient comfort. However, nurses' clinical oral care practices for patients not on mechanical ventilation in the intensive care unit are both underreported and understudied.

Aim: To explore intensive care nurses' clinical oral care practices for patients not on mechanical ventilation in intensive care units.

View Article and Find Full Text PDF

The estrogen receptor (ER or ERα) remains the primary therapeutic target for luminal breast cancer, with current treatments centered on competitive antagonists, receptor down-regulators, and aromatase inhibitors. Despite these options, resistance frequently emerges, highlighting the need for alternative targeting strategies. We discovered a novel mechanism of ER inhibition that targets the previously unexplored interface between the DNA-binding domain (DBD) and ligand-binding domain (LBD) of the receptor.

View Article and Find Full Text PDF

Tuning the Electronic Structure in the MoS/SrTiO Heterojunction via Phase Evolution of the SrTiO Substrate.

ACS Nano

September 2025

Department of Chemical Physics, Key Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China, Hefei, Anhui 230026, China.

The coupling between transition metal dichalcogenides (TMDCs) and SrTiO has recently emerged as a fertile platform for discovering interfacial phenomena, where particle interactions, lattice coupling, and dielectric screening give rise to interesting physical effects. These hybrid systems hold significant promise for two-dimensional (2D) electronics, ferroelectric state control, and metastable phase engineering. However, effective modulation of the interfacial electronic structure remains a critical challenge.

View Article and Find Full Text PDF

Background: Atherosclerosis is a chronic inflammatory disease characterized by the accumulation of lipid-laden foam cells and plaques within the arterial wall. Dysfunctional vascular smooth muscle cells (VSMCs), fibroblasts, endothelial cells, and macrophages contribute to disease progression. Here, we report that macrophage-specific expression of epsins, highly conserved endocytic adaptor proteins involved in clathrin-mediated endocytosis, accelerates atherosclerosis in Western diet-fed mice.

View Article and Find Full Text PDF

Preclinical stroke research faces a critical translational gap, with animal studies failing to reliably predict clinical efficacy. To address this, the field is moving toward rigorous, multicenter preclinical randomized controlled trials (mpRCTs) that mimic phase 3 clinical trials in several key components. This collective statement, derived from experts involved in mpRCTs, outlines considerations for designing and executing such trials.

View Article and Find Full Text PDF