Class imbalance on medical image classification: towards better evaluation practices for discrimination and calibration performance.

Candelaria Mosquera , Luciana Ferrer , Diego H Milone , Daniel Luna , Enzo Ferrante

Eur Radiol

Institute for Signals, Systems, and Computational Intelligence, sinc(i) CONICET-UNL, Santa Fe, Argentina.

Published: December 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Purpose: This work aims to assess standard evaluation practices used by the research community for evaluating medical imaging classifiers, with a specific focus on the implications of class imbalance. The analysis is performed on chest X-rays as a case study and encompasses a comprehensive model performance definition, considering both discriminative capabilities and model calibration.

Materials And Methods: We conduct a concise literature review to examine prevailing scientific practices used when evaluating X-ray classifiers. Then, we perform a systematic experiment on two major chest X-ray datasets to showcase a didactic example of the behavior of several performance metrics under different class ratios and highlight how widely adopted metrics can conceal performance in the minority class.

Results: Our literature study confirms that: (1) even when dealing with highly imbalanced datasets, the community tends to use metrics that are dominated by the majority class; and (2) it is still uncommon to include calibration studies for chest X-ray classifiers, albeit its importance in the context of healthcare. Moreover, our systematic experiments confirm that current evaluation practices may not reflect model performance in real clinical scenarios and suggest complementary metrics to better reflect the performance of the system in such scenarios.

Conclusion: Our analysis underscores the need for enhanced evaluation practices, particularly in the context of class-imbalanced chest X-ray classifiers. We recommend the inclusion of complementary metrics such as the area under the precision-recall curve (AUC-PR), adjusted AUC-PR, and balanced Brier score, to offer a more accurate depiction of system performance in real clinical scenarios, considering metrics that reflect both, discrimination and calibration performance.

Clinical Relevance Statement: This study underscores the critical need for refined evaluation metrics in medical imaging classifiers, emphasizing that prevalent metrics may mask poor performance in minority classes, potentially impacting clinical diagnoses and healthcare outcomes.

Key Points: Common scientific practices in papers dealing with X-ray computer-assisted diagnosis (CAD) systems may be misleading. We highlight limitations in reporting of evaluation metrics for X-ray CAD systems in highly imbalanced scenarios. We propose adopting alternative metrics based on experimental evaluation on large-scale datasets.

Download full-text PDF	Source
http://dx.doi.org/10.1007/s00330-024-10834-0	DOI Listing

Publication Analysis

Top Keywords

evaluation practices

x-ray classifiers

chest x-ray

metrics

class imbalance

discrimination calibration

performance

medical imaging

imaging classifiers

model performance

Similar Publications

Outcomes of resuscitative endovascular balloon occlusion of the aorta (REBOA) in severely injured adult trauma patients with isolated thoracic or abdominal trauma in Hemorrhagic shock.

Am J Emerg Med

September 2025

Department of Surgical Education, Orlando Regional Medical Center, Orlando, FL, USA; Department of Surgery, Division of Trauma and Surgical Critical Care, Orlando Regional Medical Center, Orlando, FL, USA. Electronic address:

Sanjan Kumar , Philip Lee , Ariel Hus , Ruth Zagales , Cameron Nishida

Background: There is conflicting literature regarding mortality outcomes associated with REBOA usage in patients with severe thoracic or abdominal trauma. Our study aims to assess the benefits and negative implications of REBOA use in adult trauma patients in hemorrhagic shock with severe thoracic or abdominal injuries.

Methods: This retrospective cohort analysis utilized the American College of Surgeons Trauma Quality Improvement Program Participant Use File (ACS-TQIP-PUF) database from 2017 to 2023 to evaluate adult patients with severe isolated thoracic or abdominal trauma undergoing REBOA placement.

View Article and Find Full Text PDF

Similar Publications

Cat, dog, and horse allergies: emerging new insights.

Turk J Pediatr

September 2025

Division of Allergy and Asthma, Department of Pediatrics, Faculty of Medicine, Hacettepe University, Ankara, Türkiye.

Büşra Koçali , Melike Ocak , Bülent Enis Şekerel

Animal allergens, particularly those from cats, dogs, and horses, are significant risk factors for the development of allergic diseases in childhood. Managing animal allergies requires allergen avoidance and, when this is not feasible, specific immunotherapy. Patient history remains the cornerstone of diagnosis, providing the foundation for diagnostic algorithms.

View Article and Find Full Text PDF

Similar Publications

Eye Drop Instillation Success and Hand Function in Adults with Glaucoma: A Pilot Study.

Ophthalmol Glaucoma

September 2025

Department of Ophthalmology and Visual Sciences, University of Michigan W.K. Kellogg Eye Center, Ann Arbor, Michigan. Electronic address:

Madeline K Weber , Gül G Kabil , Leslie M Niziol , Daniel Duque Urrego , Cameron Haire

Purpose: To investigate hand function and eye drop instillation success in adults with and without glaucoma.

Design: Cross-sectional pilot study.

Subjects: Adults aged ≥ 65 years with glaucoma who use eye drops daily and adults aged 65+ without glaucoma who do not regularly use eye drops.

View Article and Find Full Text PDF

Similar Publications

Estrogen therapy in patients with gynecologic cancer: a survey of gynecologists and oncologists in the United States.

Menopause

September 2025

Department of Gynecologic Oncology, Wilmot Cancer Center, University of Rochester Medical Center, Rochester, NY.

Jamie L McDowell , Myla Strawderman , Sarah J Betstadt , Richard G Moore

Objective: Endometrial cancer (EC) and epithelial ovarian cancer (EOC) affect women of all ages, and the incidence of endometrial cancer in premenopausal women is rising. Menopause can be detrimental to longevity and quality of life, but evidence suggests estrogen therapy (ET) is safe in these patients. The purpose of this study was to evaluate the practice patterns of gynecologists and gynecologic oncologists (GYO) in the United States in regards to prescription of ET to gynecologic cancer patients.

View Article and Find Full Text PDF

Similar Publications