A PHP Error was encountered

Severity: Warning

Message: file_get_contents(https://...@gmail.com&api_key=61f08fa0b96a73de8c900d749fcb997acc09&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests

Filename: helpers/my_audit_helper.php

Line Number: 197

Backtrace:

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 197
Function: file_get_contents

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 271
Function: simplexml_load_file_from_url

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3165
Function: getPubMedXML

File: /var/www/html/application/controllers/Detail.php
Line: 597
Function: pubMedSearch_Global

File: /var/www/html/application/controllers/Detail.php
Line: 511
Function: pubMedGetRelatedKeyword

File: /var/www/html/index.php
Line: 317
Function: require_once

Comparison of CT referral justification using clinical decision support and large language models in a large European cohort. | LitMetric

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: Ensuring appropriate use of CT scans is critical for patient safety and resource optimization. Decision support tools and artificial intelligence (AI), such as large language models (LLMs), have the potential to improve CT referral justification, yet require rigorous evaluation against established standards and expert assessments.

Aim: To evaluate the performance of LLMs (Generation Pre-trained Transformer 4 (GPT-4) and Claude-3 Haiku) and independent experts in justifying CT referrals compared to the ESR iGuide clinical decision support system as the reference standard.

Methods: CT referral data from 6356 patients were retrospectively analyzed. Recommendations were generated by the ESR iGuide, LLMs, and independent experts, and evaluated for accuracy, precision, recall, F1 score, and Cohen's kappa across medical test, organ, and contrast predictions. Statistical analysis included demographic stratification, confidence intervals, and p-values to ensure robust comparisons.

Results: Independent experts achieved the highest accuracy (92.4%) for medical test justification, surpassing GPT-4 (88.8%) and Claude-3 Haiku (85.2%). For organ predictions, LLMs performed comparably to experts, achieving accuracies of 75.3-77.8% versus 82.6%. For contrast predictions, GPT-4 showed the highest accuracy (57.4%) among models, while Claude demonstrated poor agreement with guidelines (kappa = 0.006).

Conclusion: Independent experts remain the most reliable, but LLMs show potential for optimization, particularly in organ prediction. A hybrid human-AI approach could enhance CT referral appropriateness and utilization. Further research should focus on improving LLM performance and exploring their integration into clinical workflows.

Key Points: Question Can GPT-4 and Claude-3 Haiku justify CT referrals as accurately as independent experts, using the ESR iGuide as the gold standard? Findings Independent experts outperformed large language models in test justification. GPT-4 and Claude-3 showed comparable organ prediction but struggled with contrast selection, limiting full automation. Clinical relevance While independent experts remain most reliable, integrating AI with expert oversight may improve CT referral appropriateness, optimizing resource allocation and enhancing clinical decision-making.

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00330-025-11608-yDOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12417242PMC

Publication Analysis

Top Keywords

independent experts
28
decision support
12
large language
12
language models
12
gpt-4 claude-3
12
claude-3 haiku
12
esr iguide
12
referral justification
8
clinical decision
8
llms potential
8

Similar Publications