A robust sampling technique for realistic distribution simulation in federated learning.

Robin Hoepp , Leonhard Rist , Alexander Katzmann , Raghavan Ashok , Andreas Wimmer , Michael Sühling , Andreas Maier

Int J Comput Assist Radiol Surg

Pattern Recognition Lab, FAU Erlangen-Nürnberg, Erlangen, Germany.

Published: September 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Purpose: Federated Learning helps training deep learning networks with diverse data from different locations, particularly in restricted clinical settings. However, label distributions overlapping only partially across clients, due to different demographics, may significantly harm the global training, and thus local model performance. Investigating such effects before rolling out large-scale Federated Learning setups requires proper sampling of the expected label distributions.

Methods: We present a sampling algorithm to build data subsets according to desired mean and standard deviations from an initial global distribution. To this end, we incorporate the chi-squared and Gini impurity measures to numerically optimize label distributions for multiple groups in an efficient fashion.

Results: Using a real-world application scenario, we sample train and test groups according to region-specific distributions for 3D camera-based weight and height estimation in a clinical context, comparing a hard data split serving as a baseline with our proposed sampling technique. We train a baseline model on all data for comparison and use Federated Averaging to combine the training of our data subsets, demonstrating a realistic deterioration of 25.3 % on weight and 28.7 % on height estimations by the global model.

Conclusions: Realistically client-biased label distribution can notably harm the training in a federated context. Our sampling algorithm for simulating realistic data distributions opens up an efficient way for prior analysis of this effect. The technique is agnostic to the chosen network architecture and target scenario and can be adapted to any feature or label problem with non-IID subpopulations.

Download full-text PDF	Source
http://dx.doi.org/10.1007/s11548-025-03504-z	DOI Listing

Publication Analysis

Top Keywords

federated learning

sampling technique

label distributions

sampling algorithm

data subsets

data

federated

label

robust sampling

technique realistic

Similar Publications

Learning from history for personalized federated learning.

Neural Netw

September 2025

College of Information Science, North China University of Technology, Beijing, China. Electronic address:

Yingxun Fu , Shulan Yin , Li Ma , Jie Liu

Personalized Federated Learning (pFL) has received extensive attentions, due to its ability to effectively process non-IID data distributed among different clients. However, most of the existing pFL methods focus on the collaboration between global and local models to enrich the personalization process, but ignoring a lot of valuable historical information, which represents the unique learning trajectory of each client. In this paper, we propose a pFL method called FedLFH, which introduces a tracking variable that allows each client to preserve historical information to facilitate personalization.

View Article and Find Full Text PDF

Similar Publications

Applications of Federated Large Language Model for Adverse Drug Reactions Prediction: Scoping Review.

J Med Internet Res

September 2025

Department of Information Systems and Cybersecurity, The University of Texas at San Antonio, 1 UTSA Circle, San Antonio, TX, 78249, United States, 1 (210) 458-6300.

David Guo , Kim-Kwang Raymond Choo

Background: Adverse drug reactions (ADR) present significant challenges in health care, where early prevention is vital for effective treatment and patient safety. Traditional supervised learning methods struggle to address heterogeneous health care data due to their unstructured nature, regulatory constraints, and restricted access to sensitive personal identifiable information.

Objective: This review aims to explore the potential of federated learning (FL) combined with natural language processing and large language models (LLMs) to enhance ADR prediction.

View Article and Find Full Text PDF

Similar Publications

Leveraging artificial intelligence and machine learning in kinase inhibitor development: advances, challenges, and future prospects.

RSC Med Chem

August 2025

Pharmaceutical Organic Chemistry Department, Faculty of Pharmacy, Suez Canal University 4.5 Km the Ring Road Ismailia 41522 Egypt.

Mohamed S Elgawish , Aya M Almatary , Sawsan A Zaitone , Mohamed S H Salem

Protein kinases are central regulators of cell signaling and play pivotal roles in a wide array of diseases, most notably cancer and autoimmune disorders. The clinical success of kinase inhibitors-such as imatinib and osimertinib-has firmly established kinases as valuable drug targets. However, the development of selective, potent inhibitors remains challenging due to the conserved nature of the ATP-binding site, off-target effects, resistance mutations, and patient-specific variability.

View Article and Find Full Text PDF

Similar Publications

Enhancing Genetic Risk Prediction through Federated Semi-Supervised Transfer Learning with Inaccurate Electronic Health Record Data.

Stat Biosci

August 2024

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.

Yuying Lu , Tian Gu , Rui Duan

Large-scale genomics data combined with Electronic Health Records (EHRs) illuminate the path towards personalized disease management and enhanced medical interventions. However, the absence of "gold standard" disease labels makes the development of machine learning models a challenging task. Additionally, imbalances in demographic representation within datasets compromise the development of unbiased healthcare solutions.

View Article and Find Full Text PDF

Similar Publications

A Tri-Factor Adaptive Federated Learning Framework for Parkinson's Disease Diagnosis via Multi-Source Facial Expression Analysis.

IEEE J Biomed Health Inform

September 2025

Meng Pang , Houwei Xu , Zheng Huang , Yintao Zhou , Shengbo Chen

Early diagnosis of Parkinson's disease (PD) is crucial for timely treatment and disease management. Recent studies link PD to impaired facial muscle control, manifesting as "masked face" symptoms, offering a novel diagnostic approach through facial expression analysis. However, data privacy concerns and legal restrictions have resulted in significant "data silos", hindering data sharing and limiting the accuracy and generalizability of existing diagnostic models due to small, localized datasets.

View Article and Find Full Text PDF

Similar Publications