Applying oversampling before cross-validation will lead to high bias in radiomics.

Sci Rep

Institute of Diagnostic and Interventional Radiology and Neuroradiology, University Hospital Essen, Hufelandstraße 55, 45147, Essen, Germany.

Published: May 2024


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Class imbalance is often unavoidable for radiomic data collected from clinical routine. It can create problems during classifier training since the majority class could dominate the minority class. Consequently, resampling methods like oversampling or undersampling are applied to the data to class-balance the data. However, the resampling must not be applied upfront to all data because it would lead to data leakage and, therefore, to erroneous results. This study aims to measure the extent of this bias. Five-fold cross-validation with 30 repeats was performed using a set of 15 radiomic datasets to train predictive models. The training involved two scenarios: first, the models were trained correctly by applying the resampling methods during the cross-validation. Second, the models were trained incorrectly by performing the resampling on all the data before cross-validation. The bias was defined empirically as the difference between the best-performing models in both scenarios in terms of area under the receiver operating characteristic curve (AUC), sensitivity, specificity, balanced accuracy, and the Brier score. In addition, a simulation study was performed on a randomly generated dataset for verification. The results demonstrated that incorrectly applying the oversampling methods to all data resulted in a large positive bias (up to 0.34 in AUC, 0.33 in sensitivity, 0.31 in specificity, and 0.37 in balanced accuracy). The bias depended on the data balance, and approximately an increase of 0.10 in the AUC was observed for each increase in imbalance. The models also showed a bias in calibration measured using the Brier score, which differed by up to -0.18 between the correctly and incorrectly trained models. The undersampling methods were not affected significantly by bias. These results emphasize that any resampling method should be applied correctly only to the training data to avoid data leakage and, subsequently, biased model performance and calibration.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11109211PMC
http://dx.doi.org/10.1038/s41598-024-62585-zDOI Listing

Publication Analysis

Top Keywords

data
10
applying oversampling
8
resampling methods
8
data leakage
8
models trained
8
balanced accuracy
8
brier score
8
bias
7
models
6
resampling
5

Similar Publications

In wheat allergy dependent on augmentation factors (WALDA), allergic reactions occur when wheat ingestion is combined with exercise or rarely other augmentation factors. We analyzed clinical characteristics and disease burden in recreationally active and trained individuals with WALDA diagnosed by oral challenge test. Clinical characteristics, serological data, and quality of life (QOL) questionnaires were analyzed and completed with follow-up interviews.

View Article and Find Full Text PDF

Objective: This study investigated the locations of amino acid modifications within two major human hair keratins (Type I K31 and Type II K85) with probable implications for protein and hair structural component integrity. The particular focus was on cysteine modifications that disrupt intra-protein and inter-protein disulphide bonds.

Methods: Human hair was exposed to accelerated, sequential heat or UV treatments, simulating effects resulting from the use of heated styling tools and environmental exposure over a time frame approximating one year.

View Article and Find Full Text PDF

Background: The success of disease registry systems (DRSs) depends on developing software that aligns with the registry's specific needs.

Objective: This study focuses on localising the Checklist with Items for Patient Registry sOftware Systems (CIPROS) to facilitate the DRS assessment.

Method: This applied and cross-sectional study was carried out in 2023 in six phases.

View Article and Find Full Text PDF

This review is intended as a guideline for beginners in confocal laser scanning microscopy. It combines basic theoretical concepts, such as fluorescence principles, resolution limits, and imaging parameters with practical guidance on sample preparation, staining strategies, and data acquisition using confocal microscopy. The aim is to combine technical and methodological aspects in order to provide a comprehensive and accessible introduction.

View Article and Find Full Text PDF

Introduction: We compared and measured alignment between the Health Level Seven (HL7) Fast Healthcare Interoperability Resources (FHIR) standard used by electronic health records (EHRs), the Clinical Data Interchange Standards Consortium (CDISC) standards used by industry, and the Uniform Data Set (UDS) used by the Alzheimer's Disease Research Centers (ADRCs).

Methods: The ADRC UDS, consisting of 5959 data elements across eleven packets, was mapped to FHIR and CDISC standards by two independent mappers, with discrepancies adjudicated by experts.

Results: Forty-five percent of the 5959 UDS data elements mapped to the FHIR standard, indicating possible electronic obtainment from EHRs.

View Article and Find Full Text PDF