98%
921
2 minutes
20
Autonomous Medical Evaluation for Guideline Adherence (AMEGA) is a comprehensive benchmark designed to evaluate large language models' adherence to medical guidelines across 20 diagnostic scenarios spanning 13 specialties. It includes an evaluation framework and methodology to assess models' capabilities in medical reasoning, differential diagnosis, treatment planning, and guideline adherence, using open-ended questions that mirror real-world clinical interactions. It includes 135 questions and 1337 weighted scoring elements designed to assess comprehensive medical knowledge. In tests of 17 LLMs, GPT-4 scored highest with 41.9/50, followed closely by Llama-3 70B and WizardLM-2-8x22B. For comparison, a recent medical graduate scored 25.8/50. The benchmark introduces novel content to avoid the issue of LLMs memorizing existing medical data. AMEGA's publicly available code supports further research in AI-assisted clinical decision-making, aiming to enhance patient care by aiding clinicians in diagnosis and treatment under time constraints.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11638254 | PMC |
http://dx.doi.org/10.1038/s41746-024-01356-6 | DOI Listing |
J Dent
September 2025
Department of Endodontics, Recep Tayyip Erdogan University, Turkey. Electronic address:
Objectives: To assess patterns across 21 countries in dentists' thresholds for initiating operative treatment of active non-cavitated carious lesions and to evaluate the influence of caries risk, clinician characteristics, and geographic variation on decision-making in accordance with current guidelines.
Methods: A cross-sectional, vignette-style web-based survey was conducted between June and October 2023 across 21 countries. A standardized questionnaire, comprising theoretical radiographic scenarios of occlusal and approximal active non-cavitated carious lesions at four progressive stages (E1,E2,EDJ,D1), was distributed to general dentists and specialists.
Heart Fail Rev
September 2025
Department of Medicine, Population Health Research Institute, McMaster University and Hamilton Health Sciences, Hamilton, Canada.
Heart failure (HF) remains a global health challenge that imposes significant clinical and economic burden. Treatment adherence to guideline-directed medical therapy (GDMT) remains a major challenge in the management of HF, despite the availability of guideline-directed medical therapy (GDMT). Polypharmacy and regimen complexity contribute to poor adherence, particularly among older adults and in resource-limited settings.
View Article and Find Full Text PDFSurgery
September 2025
Department of Surgery, University of Michigan Medical School, Ann Arbor, MI; Center for Healthcare Outcomes and Policy, University of Michigan, Ann Arbor, MI.
Background: Although procedure-specific guidelines have been established for postoperative opioid prescribing in the elective setting, it is unknown to what extent prescriptions in the emergency setting adhere to these standards. Variation in opioid prescribing for emergency general surgery patients may represent context-appropriate deviation or an opportunity for improved stewardship.
Methods: Leveraging data from a statewide Acute Care Surgery collaborative, we identified patients undergoing 4 common procedures in the emergency setting: laparoscopic appendectomy, laparoscopic cholecystectomy, emergency hernia repair, and open colectomy.
JAMA Netw Open
September 2025
Dow Division of Health Services Research, Department of Urology, University of Michigan, Ann Arbor.
Importance: Among men with favorable-risk (ie, low-risk or favorable intermediate-risk) prostate cancer, confirmatory testing substantially improves the detection of aggressive cancers that may merit treatment instead of conservative management. Despite guideline recommendations, confirmatory testing is inconsistently used, and more than half of men do not receive it. Value-based interventions and payment incentives may improve care quality by motivating adherence to guideline-concordant care.
View Article and Find Full Text PDFFront Public Health
September 2025
Management Office, Jiangsu Provincial Geriatric Hospital (Jiangsu Province Official Hospital), Nanjing, China.
Objective: This study aimed to enhance hand hygiene compliance among healthcare workers (HCWs) to reduce the incidence of hospital-acquired infections (HAIs) by employing the Plan-Do-Check-Act (PDCA) cycle, a quality management approach introduced by W. Edwards Deming.
Method: A tailored Hand Hygiene Survey Form was developed based on the Hand Hygiene Technical Specification for Healthcare Personnel and WHO guidelines.