Self-supervised identification and elimination of harmful datasets in distributed machine learning for medical image analysis.

Raissa Souza , Emma A M Stanley , Anthony J Winder , Chris Kang , Kimberly Amador , Erik Y Ohara , Gabrielle Dagasso , Richard Camicioli , Oury Monchi , Zahinoor Ismail , Matthias Wilms , Nils D Forkert

NPJ Digit Med

Department of Radiology, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.

Published: February 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Distributed learning enables collaborative machine learning model training without requiring cross-institutional data sharing, thereby addressing privacy concerns. However, local quality control variability can negatively impact model performance while systematic human visual inspection is time-consuming and may violate the goal of keeping data inaccessible outside acquisition centers. This work proposes a novel self-supervised method to identify and eliminate harmful data during distributed learning model training fully-automatically. Harmful data is defined as samples that, when included in training, increase misdiagnosis rates. The method was tested using neuroimaging data from 83 centers for Parkinson's disease classification with simulated inclusion of a few harmful data samples. The proposed method reliably identified harmful images, with centers providing only harmful datasets being easier to identify than single harmful images within otherwise good datasets. While only evaluated using neuroimaging data, the presented method is application-agnostic and presents a step towards automated quality control in distributed learning.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11830037	PMC
http://dx.doi.org/10.1038/s41746-025-01499-0	DOI Listing

Publication Analysis

Top Keywords

distributed learning

harmful data

harmful datasets

machine learning

learning model

model training

quality control

neuroimaging data

harmful images

harmful

A PHP Error was encountered