98%
921
2 minutes
20
This article provides radiologists with practical recommendations for evaluating AI performance in radiology, ensuring alignment with clinical goals and patient safety. It outlines key performance metrics, including overlap metrics for segmentation, test-based metrics (e.g., sensitivity, specificity, and area under the receiver operating characteristic curve), and outcome-based metrics (e.g., precision, negative predictive value, F1-score, Matthews correlation coefficient, and area under the precision-recall curve). Key recommendations emphasize local validation using independent datasets, selecting task-specific metrics, and considering deployment context to ensure real-world performance matches claimed efficacy. Common pitfalls, such as overreliance on a single metric, misinterpretation in low-prevalence settings, and failure to account for clinical workflow, are addressed with mitigation strategies. Additional guidance is provided on threshold selection, prevalence-adjusted evaluation, and AI-generated image quality assessment. This guide equips radiologists to critically evaluate both commercially available and in-house developed AI tools, ensuring their safe and effective integration into clinical practice. CLINICAL RELEVANCE STATEMENT: This review provides guidance on selecting and interpreting AI performance metrics in radiology, ensuring clinically meaningful evaluation and safe deployment of AI tools. By addressing common pitfalls and promoting standardized reporting, it supports radiologists in making informed decisions, ultimately improving diagnostic accuracy and patient outcomes. KEY POINTS: Radiologists must evaluate performance metrics as they reflect acceptable performance in specific datasets but do not guarantee clinical utility. Independent evaluation tailored to the clinical setting is essential. Performance metrics must align with the intended task of the AI application-segmentation, detection, or classification-and be selected based on domain knowledge and clinical context. Sensitivity, specificity, area under the ROC curve, and accuracy must be interpreted with prevalence-dependent metrics (e.g., precision, F1 score, and Matthew's correlation coefficient) calculated for the target population to ensure safe and effective clinical use.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1007/s00330-025-11890-w | DOI Listing |
J Appl Clin Med Phys
September 2025
Department of Radiation Oncology, Virginia Commonwealth University, Richmond, Virginia, USA.
Purpose: Real‑time magnetic resonance-guided radiation therapy (MRgRT) integrates MRI with a linear accelerator (Linac) for gating and adaptive radiotherapy, which requires robust image‑quality assurance over a large field of view (FOV). Specialized phantoms capable of accommodating this extensive FOV are therefore essential. This study compares the performance of four commercial MRI phantoms on a 0.
View Article and Find Full Text PDFJ Appl Clin Med Phys
September 2025
Department of Radiation Oncology, University of Utah, Salt Lake City, Utah, USA.
Purpose: The development of on-board cone-beam computed tomography (CBCT) has led to improved target localization and evaluation of patient anatomical change throughout the course of radiation therapy. HyperSight, a newly developed on-board CBCT platform by Varian, has been shown to improve image quality and HU fidelity relative to conventional CBCT. The purpose of this study is to benchmark the dose calculation accuracy of Varian's HyperSight cone-beam computed tomography (CBCT) on the Halcyon platform relative to fan-beam CT-based dose calculations and to perform end-to-end testing of HyperSight CBCT-only based treatment planning.
View Article and Find Full Text PDFJ Mol Neurosci
September 2025
Department of Physiology, School of Medicine, Dokuz Eylul University, Izmir, Turkey.
The ketogenic diet (KD), a high-fat, low-carbohydrate regimen, has been shown to exert neuroprotective effects in various neurological models. This study explored how KD-alone or combined with antibiotic-induced gut microbiota depletion-affects cognition and neuroinflammation in aging. Thirty-two male rats (22 months old) were assigned to four groups (n = 8): control diet (CD), ketogenic diet (KD), antibiotics with control diet (AB), and antibiotics with KD (KDAB).
View Article and Find Full Text PDFJ Cancer Res Clin Oncol
September 2025
Department of Surgery, Mannheim School of Medicine, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany.
Purpose: The study aims to compare the treatment recommendations generated by four leading large language models (LLMs) with those from 21 sarcoma centers' multidisciplinary tumor boards (MTBs) of the sarcoma ring trial in managing complex soft tissue sarcoma (STS) cases.
Methods: We simulated STS-MTBs using four LLMs-Llama 3.2-vison: 90b, Claude 3.
ACS Appl Mater Interfaces
September 2025
College of Chemistry and Chemical Engineering, Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, China.
The oxygen evolution reaction (OER) in conventional zinc-air batteries (ZABs) involves a complex multielectron transfer process, leading to slow reaction kinetics, high charging voltage, and low energy efficiency. To address these limitations, a zinc-ethanol/air battery (ZEAB) system that strategically replaces the OER with the ethanol oxidation reaction (EOR) possessing a lower thermodynamic potential has been proposed. Herein, a bimetallic catalyst CuCo-embedded nitrogen-doped carbon (CuCo-20%-1), derived from a Cu/Co/Cd co-coordinated metal-organic precursor, is synthesized and exhibits an excellent performance for both EOR and ORR.
View Article and Find Full Text PDF