98%
921
2 minutes
20
Thyroid cancer is one of the most common types of cancer, pathological diagnosis based on Fine Needle Aspiration Cytology is clinically used as the standard for assessing thyroid cancer. However, the complex structure and large-scale data volume of thyroid pathology images pose challenges in terms of accuracy and efficiency for automatic diagnosis. To address this practical problem, this paper proposes a knowledge distillation method called Multi-Dimensional Knowledge Distillation, which involves feature-based distillation and response-based distillation.We employ a 12-layer Vision Transformer as the teacher model. Feature-based distillation integrates feature information from spatial, channel, and class token, while response-based distillation is achieved through alignment with targets. We integrate information from these diverse dimensions and compress the knowledge into a 3-layer Vision Transformer, which serves as the student model. The student model is trained and evaluated using a dataset containing 22,111 thyroid cytopathological patches. Ultimately, our student model attains a Top-1 classification accuracy of 94.87%. Compared with the teacher model, there is only a 0.55% gap in accuracy, while the computational complexity of the model has decreased by approximately a factor of four. In addition, our method is capable of substantially inheriting the generalization advantages of the teacher model. These results collectively demonstrate the effectiveness of Multi-Dimensional Knowledge Distillation in knowledge transfer.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12358628 | PMC |
http://dx.doi.org/10.1038/s41598-025-15728-9 | DOI Listing |
IEEE Trans Pattern Anal Mach Intell
September 2025
The spectacular success of training large models on extensive datasets highlights the potential of scaling up for exceptional performance. To deploy these models on edge devices, knowledge distillation (KD) is commonly used to create a compact model from a larger, pretrained teacher model. However, as models and datasets rapidly scale up in practical applications, it is crucial to consider the applicability of existing KD approaches originally designed for limited-capacity architectures and small-scale datasets.
View Article and Find Full Text PDFBMJ Open
September 2025
Nursing Department, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China.
Objectives: To gain an in-depth understanding of the real support priorities and perceptions of caregivers of individuals receiving care with end-stage heart failure regarding hospice care.
Design: A qualitative descriptive approach was employed.
Participants And Setting: Using a purposive sampling approach, 16 primary caregivers of individuals receiving care with end-stage heart failure from a tertiary hospital in Hangzhou, Zhejiang province, were selected as interview participants.
IEEE Trans Neural Netw Learn Syst
September 2025
Knowledge distillation (KD) aims to transfer knowledge from a large-scale teacher model to a lightweight one, significantly reducing computational and storage requirements. However, the inherent learning capacity gap between the teacher and student often hinders the sufficient transfer of knowledge, motivating numerous studies to address this challenge. Inspired by the progressive approximation principle in the Stone-Weierstrass theorem, we propose expandable residual approximation (ERA), a novel KD method that decomposes the approximation of residual knowledge into multiple steps, reducing the difficulty of mimicking the teacher's representation through a divide-and-conquer approach.
View Article and Find Full Text PDFIEEE Internet Things J
August 2025
Geometric Media Lab, School of Arts, Media and Engineering and School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85281 USA.
Human gait analysis with wearable sensors has been widely used in various applications, such as daily life healthcare, rehabilitation, physical therapy, and clinical diagnostics and monitoring. In particular, ground reaction force (GRF) provides critical information about how the body interacts with the ground during locomotion. Although instrumented treadmills have been widely used as the gold standard for measuring GRF during walking, their lack of portability and high cost make them impractical for many applications.
View Article and Find Full Text PDFFront Neurorobot
August 2025
College of Air Traffic Management, Civil Aviation Flight University of China, Chengdu, China.
Introduction: To address the challenges of current 4D trajectory prediction-specifically, limited multi-factor feature extraction and excessive computational cost-this study develops a lightweight prediction framework tailored for real-time air-traffic management.
Methods: We propose a hybrid RCBAM-TCN-LSTM architecture enhanced with a teacher-student knowledge distillation mechanism. The Residual Convolutional Block Attention Module (RCBAM) serves as the teacher network to extract high-dimensional spatial features via residual structures and channel-spatial attention.