Adaptive metric for knowledge distillation by deep Bregman divergence.

Tongtong Yuan , Zixuan Xu , Bo Liu , Yinan Tang

Neural Netw

Inspur Electronic Information Industry Co., Ltd, China.

Published: August 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Knowledge distillation (KD) makes it possible to deploy high-accuracy models on devices with limited resources and is an effective means of achieving lightweight models. With the advancement of technology, the methods of knowledge distillation are also continuously developing and improving to adapt to different application scenarios and needs. To facilitate the transfer of knowledge from larger networks to smaller and lighter networks, KD has been employed to bridge the gap in probability outputs or middle-layer representations between teacher and student networks. Unlike the consistent probability outputs observed between teacher and student networks, the middle-layer representations exhibit significant variations in both structure and distribution. Traditional metrics such as Euclidean distance or MSE treat all intermediate features uniformly and do not adapt to the heterogeneous characteristics of feature distributions across different layers or models. These fixed metrics often fail to account for the spatial, semantic, and statistical variations between teacher and student networks. To address this limitation, we propose using a parameterized and adaptive metric based on deep Bregman divergence. This divergence function is learned from data, enabling the measurement to adjust to the underlying feature distributions at different layers, leading to more effective and robust knowledge transfer. Importantly, our method can also serve as a complementary enhancement (i.e., x+Bregman) to almost all other KD methods focused on distilling probability outputs. Extensive experiments demonstrate that our approach outperforms many existing KD methods, achieving superior performance across diverse datasets and network models.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.neunet.2025.108016	DOI Listing

Publication Analysis

Top Keywords

knowledge distillation

probability outputs

teacher student

student networks

adaptive metric

deep bregman

bregman divergence

middle-layer representations

feature distributions

distributions layers

Similar Publications

Adaptive metric for knowledge distillation by deep Bregman divergence.

Neural Netw

August 2025

Inspur Electronic Information Industry Co., Ltd, China.

Tongtong Yuan , Zixuan Xu , Bo Liu , Yinan Tang

View Article and Find Full Text PDF

Similar Publications

Heterogeneity-aware high-efficiency federated learning with hybrid synchronous-asynchronous splitting strategy.

Neural Netw

August 2025

Information Systems Technology and Design, Singapore University of Technology and Design, Singapore. Electronic address:

Zijian Li , Boyuan Li , Kunyu Zhang , Bingcai Wei , Hongbo Liu

Federated Learning (FL) offers a promising privacy-preserving framework for collaborative global model training without exposing local private data. However, system heterogeneity in FL causes the straggler issue, where resource-limited edge devices delay global model aggregation. Existing approaches, such as asynchronous mechanisms or kick-out methods, primarily focus on optimizing model convergence efficiency but often overlook edge resource constraints, potentially resulting in model bias toward high-performance devices or omission of critical data.

View Article and Find Full Text PDF

Similar Publications

A generalizable pathology foundation model using a unified knowledge distillation pretraining framework.

Nat Biomed Eng

September 2025

Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China.

Jiabo Ma , Zhengrui Guo , Fengtao Zhou , Yihui Wang , Yingxue Xu

The generalization ability of foundation models in the field of computational pathology (CPath) is crucial for their clinical success. However, current foundation models have only been evaluated on a limited type and number of tasks, leaving their generalization ability unclear. We establish a comprehensive benchmark to evaluate the performance of off-the-shelf foundation models across six distinct clinical task types, encompassing a total of 72 specific tasks.

View Article and Find Full Text PDF

Similar Publications

A Cross-Modal Mutual Knowledge Distillation Framework for Alzheimer's Disease Diagnosis: Addressing Incomplete Modalities.

IEEE Trans Autom Sci Eng

March 2025

H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA.

Min Gu Kwak , Lingchao Mao , Zhiyang Zheng , Yi Su , Fleming Lure

Early detection of Alzheimer's Disease (AD) is crucial for timely interventions and optimizing treatment outcomes. Integrating multimodal neuroimaging datasets can enhance the early detection of AD. However, models must address the challenge of incomplete modalities, a common issue in real-world scenarios, as not all patients have access to all modalities due to practical constraints such as cost and availability.

View Article and Find Full Text PDF

Similar Publications

Combining knowledge distillation and neural networks to predict protein secondary structure.

Sci Rep

August 2025

School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, 430073, China.

Lufei Zhao , Jingyi Li , Biao Zhang , Xuchu Jiang

The secondary structure of a protein serves as the foundation for constructing its three-dimensional (3D) structure, which in turn is critical for determining its function and role in biological processes. Therefore, accurately predicting secondary structure not only facilitates the understanding of a protein's 3D conformation but also provides essential insights into its interactions, functional mechanisms, and potential applications in biomedical research. Deep learning models are particularly effective in protein secondary structure prediction because of their ability to process complex sequence data and extract meaningful patterns, thereby increasing prediction accuracy and efficiency.

View Article and Find Full Text PDF

Similar Publications