A PHP Error was encountered

Severity: Warning

Message: file_get_contents(https://...@gmail.com&api_key=61f08fa0b96a73de8c900d749fcb997acc09&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests

Filename: helpers/my_audit_helper.php

Line Number: 197

Backtrace:

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 197
Function: file_get_contents

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 271
Function: simplexml_load_file_from_url

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3165
Function: getPubMedXML

File: /var/www/html/application/controllers/Detail.php
Line: 597
Function: pubMedSearch_Global

File: /var/www/html/application/controllers/Detail.php
Line: 511
Function: pubMedGetRelatedKeyword

File: /var/www/html/index.php
Line: 317
Function: require_once

Improving large language models accuracy for aortic stenosis treatment via Heart Team simulation: a prompt design analysis. | LitMetric

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Aims: Large language models (LLMs) have shown potential in clinical decision support, but the influence of prompt design on their performance, particularly in complex cardiology decision-making, is not well understood.

Methods And Results: We retrospectively reviewed 231 patients evaluated by our Heart Team for severe aortic stenosis, with treatment options including surgical aortic valve replacement, transcatheter aortic valve implantation, or medical therapy. We tested multiple prompt-design strategies using zero-shot (0-shot), Chain-of-Thought (CoT), and Tree-of-Thought (ToT) prompting, combined with few-shot prompting, free/guided-thinking, and self-consistency. Patient data were condensed into standardized vignettes and queried using GPT4-o (version 2024-05-13, OpenAI) 40 times per patient under each prompt (147 840 total queries). Primary endpoint was mean accuracy; secondary endpoints included sensitivity, specificity, area under the curve (AUC), and treatment invasiveness. Guided-thinking-ToT achieved the highest accuracy (94.04%, 95% CI 90.87-97.21), significantly outperforming few-shot-ToT (87.16%, 95% CI 82.68-91.63) and few-shot-CoT (85.32%, 95% CI 80.59-90.06; < 0.0001). Zero-shot prompting showed the lowest accuracy (73.39%, 95% CI 67.48-79.31). Guided-thinking-ToT yielded the highest AUC values (up to 0.97) and was the only prompt whose invasiveness did not differ significantly from Heart Team decisions ( = 0.078). An inverted quadratic relationship emerged between few-shot examples and accuracy, with nine examples optimal ( < 0.0001). Self-consistency improved overall accuracy, particularly for ToT-derived prompts ( < 0.001).

Conclusion: Prompt design significantly impacts LLM performance in clinical decision-making for severe aortic stenosis. Tree-of-Thought prompting markedly improved accuracy and aligned recommendations with expert decisions, though LLMs tended toward conservative treatment approaches.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12282391PMC
http://dx.doi.org/10.1093/ehjdh/ztaf068DOI Listing

Publication Analysis

Top Keywords

aortic stenosis
12
heart team
12
prompt design
12
large language
8
language models
8
stenosis treatment
8
severe aortic
8
aortic valve
8
improved accuracy
8
accuracy
7

Similar Publications