Autonomous medical evaluation for guideline adherence of large language models.

NPJ Digit Med

Department of Diagnostic and Interventional Radiology, Technical University of Munich, School of Medicine and Health, Klinikum rechts der Isar, TUM University Hospital, Munich, Germany.

Published: December 2024


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Autonomous Medical Evaluation for Guideline Adherence (AMEGA) is a comprehensive benchmark designed to evaluate large language models' adherence to medical guidelines across 20 diagnostic scenarios spanning 13 specialties. It includes an evaluation framework and methodology to assess models' capabilities in medical reasoning, differential diagnosis, treatment planning, and guideline adherence, using open-ended questions that mirror real-world clinical interactions. It includes 135 questions and 1337 weighted scoring elements designed to assess comprehensive medical knowledge. In tests of 17 LLMs, GPT-4 scored highest with 41.9/50, followed closely by Llama-3 70B and WizardLM-2-8x22B. For comparison, a recent medical graduate scored 25.8/50. The benchmark introduces novel content to avoid the issue of LLMs memorizing existing medical data. AMEGA's publicly available code supports further research in AI-assisted clinical decision-making, aiming to enhance patient care by aiding clinicians in diagnosis and treatment under time constraints.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11638254PMC
http://dx.doi.org/10.1038/s41746-024-01356-6DOI Listing

Publication Analysis

Top Keywords

guideline adherence
12
autonomous medical
8
medical evaluation
8
evaluation guideline
8
large language
8
diagnosis treatment
8
medical
6
adherence
4
adherence large
4
language models
4

Similar Publications

Objectives: To assess patterns across 21 countries in dentists' thresholds for initiating operative treatment of active non-cavitated carious lesions and to evaluate the influence of caries risk, clinician characteristics, and geographic variation on decision-making in accordance with current guidelines.

Methods: A cross-sectional, vignette-style web-based survey was conducted between June and October 2023 across 21 countries. A standardized questionnaire, comprising theoretical radiographic scenarios of occlusal and approximal active non-cavitated carious lesions at four progressive stages (E1,E2,EDJ,D1), was distributed to general dentists and specialists.

View Article and Find Full Text PDF

Heart failure (HF) remains a global health challenge that imposes significant clinical and economic burden. Treatment adherence to guideline-directed medical therapy (GDMT) remains a major challenge in the management of HF, despite the availability of guideline-directed medical therapy (GDMT). Polypharmacy and regimen complexity contribute to poor adherence, particularly among older adults and in resource-limited settings.

View Article and Find Full Text PDF

Background: Although procedure-specific guidelines have been established for postoperative opioid prescribing in the elective setting, it is unknown to what extent prescriptions in the emergency setting adhere to these standards. Variation in opioid prescribing for emergency general surgery patients may represent context-appropriate deviation or an opportunity for improved stewardship.

Methods: Leveraging data from a statewide Acute Care Surgery collaborative, we identified patients undergoing 4 common procedures in the emergency setting: laparoscopic appendectomy, laparoscopic cholecystectomy, emergency hernia repair, and open colectomy.

View Article and Find Full Text PDF

Importance: Among men with favorable-risk (ie, low-risk or favorable intermediate-risk) prostate cancer, confirmatory testing substantially improves the detection of aggressive cancers that may merit treatment instead of conservative management. Despite guideline recommendations, confirmatory testing is inconsistently used, and more than half of men do not receive it. Value-based interventions and payment incentives may improve care quality by motivating adherence to guideline-concordant care.

View Article and Find Full Text PDF

Objective: This study aimed to enhance hand hygiene compliance among healthcare workers (HCWs) to reduce the incidence of hospital-acquired infections (HAIs) by employing the Plan-Do-Check-Act (PDCA) cycle, a quality management approach introduced by W. Edwards Deming.

Method: A tailored Hand Hygiene Survey Form was developed based on the Hand Hygiene Technical Specification for Healthcare Personnel and WHO guidelines.

View Article and Find Full Text PDF