Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

This study evaluated large language models (LLMs) using 30 questions, each derived from a recommendation in the 2024 European Society of Cardiology (ESC) guidelines for atrial fibrillation (AF) management. These recommendations were stratified by class of recommendation and level of evidence. The primary objective was to assess the reliability and consistency of LLM-generated classifications compared to those in the ESC guidelines. Additionally, the study assessed the impact of different prompting strategies and working languages on LLM performance. Three prompting strategies were tested: Input-output (IO), 0-shot-Chain of thought (0-COT) and Performed-Chain of thought (P-COT) prompting. Each question, presented in both English and Chinese, was input into three LLMs: ChatGPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro. The reliability of the different LLM-prompt combinations showed moderate to substantial agreement (Fleiss kappa ranged from 0.449 to 0.763). Claude 3.5 with P-COT prompting had the highest recommendation classification consistency (60.3%). No significant differences were observed between English and Chinese across most LLM-prompt combinations. Bias analysis of inconsistent outcomes revealed a propensity towards more recommended treatments and stronger evidence levels across most LLM-prompt combinations. The characteristics of clinical questions potentially influence LLM performance. This study highlights the limitations in the accuracy of LLM responses to AF-related questions. To gather more comprehensive insights, conducting repeated queries is advisable. Future efforts should focus on expanding the use of diverse prompting strategies, conducting ongoing model evaluation and refinement, and establishing a comprehensive, objective benchmarking system.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12125184PMC
http://dx.doi.org/10.1038/s41598-025-04309-5DOI Listing

Publication Analysis

Top Keywords

prompting strategies
16
llm-prompt combinations
12
large language
8
language models
8
atrial fibrillation
8
fibrillation management
8
esc guidelines
8
llm performance
8
p-cot prompting
8
english chinese
8

Similar Publications

Quinoline as a Photochemical Toolbox: From Substrate to Catalyst and Beyond.

Acc Chem Res

September 2025

Department of Chemistry, FRQNT Centre for Green Chemistry and Catalysis, McGill University, 801 Sherbrooke Street W, Montréal, Québec H3A 0B8, Canada.

ConspectusMolecular photochemistry, by harnessing the excited states of organic molecules, provides a platform fundamentally distinct from thermochemistry for generating reactive open-shell or spin-active species under mild conditions. Among its diverse applications, the resurgence of the Minisci-type reaction, a transformation historically reliant on thermally initiated radical conditions, has been fueled by modern photochemical strategies with improved efficiency and selectivity. Consequently, the photochemical Minisci-type reaction ranks among the most enabling methods for C()-H functionalizations of heteroarenes, which are of particular significance in medicinal chemistry for the rapid diversification of bioactive scaffolds.

View Article and Find Full Text PDF

[Helicobacter pylori].

Inn Med (Heidelb)

September 2025

Klink für Innere Medizin, Gastroenterologie und Diabetologie, Niels-Stensen-Kliniken Marienhospital Osnabrück, Osnabrück, Deutschland.

Helicobacter pylori was first characterized as an obligate bacterial pathogen in 1983. Since then, substantial advances have been made in understanding the pathophysiology of H. pylori infection, optimizing diagnostic and therapeutic strategies, and expanding testing and treatment-including in the prevention of gastric malignancies.

View Article and Find Full Text PDF

Background: Actinomyces graevenitzii is a relatively uncommon Actinomyces species, which is an oral species and predominantly recovered from respiratory locations [1,2]. It is a gram-positive anaerobic bacteria or microaerobic filamentation bacteria, which can induce pyogenic and granulomatous inflammation characterized by swelling and concomitant pus, sinus formation, and the formation of yellow sulfur granules. All tissues and organs can be infected; the most common type involves the neck and face (55%), followed by the abdominal and pelvic cavities (20%).

View Article and Find Full Text PDF

Introduction: Liver transplantation for polycystic liver disease (PLD) poses significant intraoperative risks due to the presence of a massively enlarged liver. We report a rare case of intraoperative pneumothorax and pneumatocele formation during total hepatectomy, which was successfully managed with a non-operative approach.

Case Presentation: A female patient in her 40s with a history of autosomal dominant polycystic kidney disease presented with progressive liver cyst enlargement (Gigot type III, Qian classification Grade 4), which led to decreased activities of daily living and intracystic hemorrhage.

View Article and Find Full Text PDF

The literature on the exact incidence of equipment failure during urological surgery is rather heterogeneous. Although failure rates are unacceptably high in other surgical disciplines, more compelling evidence is needed in urology. The present study provides case examples to illustrate several instances of urological instrument malfunction encountered in daily surgical practice, from the field of endourology to the newer robotic systems.

View Article and Find Full Text PDF