98%
921
2 minutes
20
Knowledge distillation (KD) makes it possible to deploy high-accuracy models on devices with limited resources and is an effective means of achieving lightweight models. With the advancement of technology, the methods of knowledge distillation are also continuously developing and improving to adapt to different application scenarios and needs. To facilitate the transfer of knowledge from larger networks to smaller and lighter networks, KD has been employed to bridge the gap in probability outputs or middle-layer representations between teacher and student networks. Unlike the consistent probability outputs observed between teacher and student networks, the middle-layer representations exhibit significant variations in both structure and distribution. Traditional metrics such as Euclidean distance or MSE treat all intermediate features uniformly and do not adapt to the heterogeneous characteristics of feature distributions across different layers or models. These fixed metrics often fail to account for the spatial, semantic, and statistical variations between teacher and student networks. To address this limitation, we propose using a parameterized and adaptive metric based on deep Bregman divergence. This divergence function is learned from data, enabling the measurement to adjust to the underlying feature distributions at different layers, leading to more effective and robust knowledge transfer. Importantly, our method can also serve as a complementary enhancement (i.e., x+Bregman) to almost all other KD methods focused on distilling probability outputs. Extensive experiments demonstrate that our approach outperforms many existing KD methods, achieving superior performance across diverse datasets and network models.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.neunet.2025.108016 | DOI Listing |
Neural Netw
August 2025
Inspur Electronic Information Industry Co., Ltd, China.
Knowledge distillation (KD) makes it possible to deploy high-accuracy models on devices with limited resources and is an effective means of achieving lightweight models. With the advancement of technology, the methods of knowledge distillation are also continuously developing and improving to adapt to different application scenarios and needs. To facilitate the transfer of knowledge from larger networks to smaller and lighter networks, KD has been employed to bridge the gap in probability outputs or middle-layer representations between teacher and student networks.
View Article and Find Full Text PDFNeural Netw
August 2025
Information Systems Technology and Design, Singapore University of Technology and Design, Singapore. Electronic address:
Federated Learning (FL) offers a promising privacy-preserving framework for collaborative global model training without exposing local private data. However, system heterogeneity in FL causes the straggler issue, where resource-limited edge devices delay global model aggregation. Existing approaches, such as asynchronous mechanisms or kick-out methods, primarily focus on optimizing model convergence efficiency but often overlook edge resource constraints, potentially resulting in model bias toward high-performance devices or omission of critical data.
View Article and Find Full Text PDFNat Biomed Eng
September 2025
Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China.
The generalization ability of foundation models in the field of computational pathology (CPath) is crucial for their clinical success. However, current foundation models have only been evaluated on a limited type and number of tasks, leaving their generalization ability unclear. We establish a comprehensive benchmark to evaluate the performance of off-the-shelf foundation models across six distinct clinical task types, encompassing a total of 72 specific tasks.
View Article and Find Full Text PDFIEEE Trans Autom Sci Eng
March 2025
H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA.
Early detection of Alzheimer's Disease (AD) is crucial for timely interventions and optimizing treatment outcomes. Integrating multimodal neuroimaging datasets can enhance the early detection of AD. However, models must address the challenge of incomplete modalities, a common issue in real-world scenarios, as not all patients have access to all modalities due to practical constraints such as cost and availability.
View Article and Find Full Text PDFSci Rep
August 2025
School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, 430073, China.
The secondary structure of a protein serves as the foundation for constructing its three-dimensional (3D) structure, which in turn is critical for determining its function and role in biological processes. Therefore, accurately predicting secondary structure not only facilitates the understanding of a protein's 3D conformation but also provides essential insights into its interactions, functional mechanisms, and potential applications in biomedical research. Deep learning models are particularly effective in protein secondary structure prediction because of their ability to process complex sequence data and extract meaningful patterns, thereby increasing prediction accuracy and efficiency.
View Article and Find Full Text PDF