Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Purpose: Federated Learning helps training deep learning networks with diverse data from different locations, particularly in restricted clinical settings. However, label distributions overlapping only partially across clients, due to different demographics, may significantly harm the global training, and thus local model performance. Investigating such effects before rolling out large-scale Federated Learning setups requires proper sampling of the expected label distributions.

Methods: We present a sampling algorithm to build data subsets according to desired mean and standard deviations from an initial global distribution. To this end, we incorporate the chi-squared and Gini impurity measures to numerically optimize label distributions for multiple groups in an efficient fashion.

Results: Using a real-world application scenario, we sample train and test groups according to region-specific distributions for 3D camera-based weight and height estimation in a clinical context, comparing a hard data split serving as a baseline with our proposed sampling technique. We train a baseline model on all data for comparison and use Federated Averaging to combine the training of our data subsets, demonstrating a realistic deterioration of 25.3 % on weight and 28.7 % on height estimations by the global model.

Conclusions: Realistically client-biased label distribution can notably harm the training in a federated context. Our sampling algorithm for simulating realistic data distributions opens up an efficient way for prior analysis of this effect. The technique is agnostic to the chosen network architecture and target scenario and can be adapted to any feature or label problem with non-IID subpopulations.

Download full-text PDF

Source
http://dx.doi.org/10.1007/s11548-025-03504-zDOI Listing

Publication Analysis

Top Keywords

federated learning
12
sampling technique
8
label distributions
8
sampling algorithm
8
data subsets
8
data
6
federated
5
label
5
robust sampling
4
technique realistic
4

Similar Publications

Learning from history for personalized federated learning.

Neural Netw

September 2025

College of Information Science, North China University of Technology, Beijing, China. Electronic address:

Personalized Federated Learning (pFL) has received extensive attentions, due to its ability to effectively process non-IID data distributed among different clients. However, most of the existing pFL methods focus on the collaboration between global and local models to enrich the personalization process, but ignoring a lot of valuable historical information, which represents the unique learning trajectory of each client. In this paper, we propose a pFL method called FedLFH, which introduces a tracking variable that allows each client to preserve historical information to facilitate personalization.

View Article and Find Full Text PDF

Applications of Federated Large Language Model for Adverse Drug Reactions Prediction: Scoping Review.

J Med Internet Res

September 2025

Department of Information Systems and Cybersecurity, The University of Texas at San Antonio, 1 UTSA Circle, San Antonio, TX, 78249, United States, 1 (210) 458-6300.

Background: Adverse drug reactions (ADR) present significant challenges in health care, where early prevention is vital for effective treatment and patient safety. Traditional supervised learning methods struggle to address heterogeneous health care data due to their unstructured nature, regulatory constraints, and restricted access to sensitive personal identifiable information.

Objective: This review aims to explore the potential of federated learning (FL) combined with natural language processing and large language models (LLMs) to enhance ADR prediction.

View Article and Find Full Text PDF

Protein kinases are central regulators of cell signaling and play pivotal roles in a wide array of diseases, most notably cancer and autoimmune disorders. The clinical success of kinase inhibitors-such as imatinib and osimertinib-has firmly established kinases as valuable drug targets. However, the development of selective, potent inhibitors remains challenging due to the conserved nature of the ATP-binding site, off-target effects, resistance mutations, and patient-specific variability.

View Article and Find Full Text PDF

Large-scale genomics data combined with Electronic Health Records (EHRs) illuminate the path towards personalized disease management and enhanced medical interventions. However, the absence of "gold standard" disease labels makes the development of machine learning models a challenging task. Additionally, imbalances in demographic representation within datasets compromise the development of unbiased healthcare solutions.

View Article and Find Full Text PDF

Early diagnosis of Parkinson's disease (PD) is crucial for timely treatment and disease management. Recent studies link PD to impaired facial muscle control, manifesting as "masked face" symptoms, offering a novel diagnostic approach through facial expression analysis. However, data privacy concerns and legal restrictions have resulted in significant "data silos", hindering data sharing and limiting the accuracy and generalizability of existing diagnostic models due to small, localized datasets.

View Article and Find Full Text PDF