Applying oversampling before cross-validation will lead to high bias in radiomics.

Sci Rep

Institute of Diagnostic and Interventional Radiology and Neuroradiology, University Hospital Essen, Hufelandstraße 55, 45147, Essen, Germany.

Published: May 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Class imbalance is often unavoidable for radiomic data collected from clinical routine. It can create problems during classifier training since the majority class could dominate the minority class. Consequently, resampling methods like oversampling or undersampling are applied to the data to class-balance the data. However, the resampling must not be applied upfront to all data because it would lead to data leakage and, therefore, to erroneous results. This study aims to measure the extent of this bias. Five-fold cross-validation with 30 repeats was performed using a set of 15 radiomic datasets to train predictive models. The training involved two scenarios: first, the models were trained correctly by applying the resampling methods during the cross-validation. Second, the models were trained incorrectly by performing the resampling on all the data before cross-validation. The bias was defined empirically as the difference between the best-performing models in both scenarios in terms of area under the receiver operating characteristic curve (AUC), sensitivity, specificity, balanced accuracy, and the Brier score. In addition, a simulation study was performed on a randomly generated dataset for verification. The results demonstrated that incorrectly applying the oversampling methods to all data resulted in a large positive bias (up to 0.34 in AUC, 0.33 in sensitivity, 0.31 in specificity, and 0.37 in balanced accuracy). The bias depended on the data balance, and approximately an increase of 0.10 in the AUC was observed for each increase in imbalance. The models also showed a bias in calibration measured using the Brier score, which differed by up to -0.18 between the correctly and incorrectly trained models. The undersampling methods were not affected significantly by bias. These results emphasize that any resampling method should be applied correctly only to the training data to avoid data leakage and, subsequently, biased model performance and calibration.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11109211	PMC
http://dx.doi.org/10.1038/s41598-024-62585-z	DOI Listing

Publication Analysis

Top Keywords

data

applying oversampling

resampling methods

data leakage

models trained

balanced accuracy

brier score

bias

models

resampling

Similar Publications

Clinical Characteristics and Disease Burden of Wheat Allergy Dependent on Augmentation Factors in Recreationally Active and Trained Individuals.

Scand J Med Sci Sports

September 2025

Department of Dermatology and Allergy Biederstein, School of Medicine and Health, TUM University Hospital Rechts der Isar, Munich, Germany.

Valentina Faihs , Claudia Kugler , Rebekka K Linhart , Julia Felicitas Pilz , Tilo Biedermann

In wheat allergy dependent on augmentation factors (WALDA), allergic reactions occur when wheat ingestion is combined with exercise or rarely other augmentation factors. We analyzed clinical characteristics and disease burden in recreationally active and trained individuals with WALDA diagnosed by oral challenge test. Clinical characteristics, serological data, and quality of life (QOL) questionnaires were analyzed and completed with follow-up interviews.

View Article and Find Full Text PDF

Similar Publications

Key locations of oxidative damage in human hair keratins after heat and ultraviolet light exposure.

Int J Cosmet Sci

September 2025

Smart Foods and Bioproducts, AgResearch, Lincoln, New Zealand.

Jeffrey E Plowman , Anita J Grosvenor , Santanu Deb-Choudhury , Fraser Bell , Julie Roberts

Objective: This study investigated the locations of amino acid modifications within two major human hair keratins (Type I K31 and Type II K85) with probable implications for protein and hair structural component integrity. The particular focus was on cysteine modifications that disrupt intra-protein and inter-protein disulphide bonds.

Methods: Human hair was exposed to accelerated, sequential heat or UV treatments, simulating effects resulting from the use of heated styling tools and environmental exposure over a time frame approximating one year.

View Article and Find Full Text PDF

Similar Publications

Localisation of an assessment tool for disease registry software.

Health Inf Manag

September 2025

Health Information Technology Research Center, Isfahan University of Medical Sciences, Isfahan, Iran.

Sakineh Saghaeiannejad Isfahani , Monireh Sadeqi Jabali , Saeedeh Sedaghat , Reza Jalali , Hossein Bagherian

Background: The success of disease registry systems (DRSs) depends on developing software that aligns with the registry's specific needs.

Objective: This study focuses on localising the Checklist with Items for Patient Registry sOftware Systems (CIPROS) to facilitate the DRS assessment.

Method: This applied and cross-sectional study was carried out in 2023 in six phases.

View Article and Find Full Text PDF

Similar Publications

Biofilm Analysis by Confocal Microscopy-Basics and Practical Aspects.

Microsc Res Tech

September 2025

Department of River Ecology, Helmholtz Centre for Environmental Research-UFZ, Magdeburg, Germany.

Thomas R Neu , Ute Kuhlicke

This review is intended as a guideline for beginners in confocal laser scanning microscopy. It combines basic theoretical concepts, such as fluorescence principles, resolution limits, and imaging parameters with practical guidance on sample preparation, staining strategies, and data acquisition using confocal microscopy. The aim is to combine technical and methodological aspects in order to provide a comprehensive and accessible introduction.

View Article and Find Full Text PDF

Similar Publications

Measuring alignment between the ADRC UDS data elements, FDA, and EHR data standards.

Alzheimers Dement

September 2025

Department of Population Health Sciences, University of Texas Health Science Center at San Antonio, San Antonio, Texas, USA.

Zhan Wang , Kayla Torres , Helen Foster , Gary Walker , Maryam Y Garza

Introduction: We compared and measured alignment between the Health Level Seven (HL7) Fast Healthcare Interoperability Resources (FHIR) standard used by electronic health records (EHRs), the Clinical Data Interchange Standards Consortium (CDISC) standards used by industry, and the Uniform Data Set (UDS) used by the Alzheimer's Disease Research Centers (ADRCs).

Methods: The ADRC UDS, consisting of 5959 data elements across eleven packets, was mapped to FHIR and CDISC standards by two independent mappers, with discrepancies adjudicated by experts.

Results: Forty-five percent of the 5959 UDS data elements mapped to the FHIR standard, indicating possible electronic obtainment from EHRs.

View Article and Find Full Text PDF

Similar Publications