High efficiency classification of thyroid cytopathological images based on knowledge distillation and vision transformer.

Jiazhe Zhang , Haolin Zhang , Peng Jiang , Qin Huang , Guangya Zhu , Jingjing Chen , Yingling Cheng , Shu Ran , Fusong Jiang

Sci Rep

Department of Endocrinology and Metabolism, Shanghai Jiao Tong University School of Medicine Affiliated Sixth People's Hospital, Shanghai, 200233, China.

Published: August 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Thyroid cancer is one of the most common types of cancer, pathological diagnosis based on Fine Needle Aspiration Cytology is clinically used as the standard for assessing thyroid cancer. However, the complex structure and large-scale data volume of thyroid pathology images pose challenges in terms of accuracy and efficiency for automatic diagnosis. To address this practical problem, this paper proposes a knowledge distillation method called Multi-Dimensional Knowledge Distillation, which involves feature-based distillation and response-based distillation.We employ a 12-layer Vision Transformer as the teacher model. Feature-based distillation integrates feature information from spatial, channel, and class token, while response-based distillation is achieved through alignment with targets. We integrate information from these diverse dimensions and compress the knowledge into a 3-layer Vision Transformer, which serves as the student model. The student model is trained and evaluated using a dataset containing 22,111 thyroid cytopathological patches. Ultimately, our student model attains a Top-1 classification accuracy of 94.87%. Compared with the teacher model, there is only a 0.55% gap in accuracy, while the computational complexity of the model has decreased by approximately a factor of four. In addition, our method is capable of substantially inheriting the generalization advantages of the teacher model. These results collectively demonstrate the effectiveness of Multi-Dimensional Knowledge Distillation in knowledge transfer.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12358628	PMC
http://dx.doi.org/10.1038/s41598-025-15728-9	DOI Listing

Publication Analysis

Top Keywords

knowledge distillation

vision transformer

teacher model

student model

thyroid cytopathological

thyroid cancer

multi-dimensional knowledge

feature-based distillation

distillation

model

Similar Publications

Toward Effective Knowledge Distillation: Navigating Beyond Small-data Pitfall.

IEEE Trans Pattern Anal Mach Intell

September 2025

Zhiwei Hao , Jianyuan Guo , Kai Han , Han Hu , Chang Xu

The spectacular success of training large models on extensive datasets highlights the potential of scaling up for exceptional performance. To deploy these models on edge devices, knowledge distillation (KD) is commonly used to create a compact model from a larger, pretrained teacher model. However, as models and datasets rapidly scale up in practical applications, it is crucial to consider the applicability of existing KD approaches originally designed for limited-capacity architectures and small-scale datasets.

View Article and Find Full Text PDF

Similar Publications

Hospice care support priorities and perceptions of family caregivers of individuals with end-stage heart failure in China: a qualitative study.

BMJ Open

September 2025

Nursing Department, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China.

Chunyan Chen , Haixiang Zhu , Yehua Wang , Xiaoxue Han , Qijin Xu

Objectives: To gain an in-depth understanding of the real support priorities and perceptions of caregivers of individuals receiving care with end-stage heart failure regarding hospice care.

Design: A qualitative descriptive approach was employed.

Participants And Setting: Using a purposive sampling approach, 16 primary caregivers of individuals receiving care with end-stage heart failure from a tertiary hospital in Hangzhou, Zhejiang province, were selected as interview participants.

View Article and Find Full Text PDF

Similar Publications

Expandable Residual Approximation for Knowledge Distillation.

IEEE Trans Neural Netw Learn Syst

September 2025

Zhaoyi Yan , Binghui Chen , Yunfan Liu , Qixiang Ye

Knowledge distillation (KD) aims to transfer knowledge from a large-scale teacher model to a lightweight one, significantly reducing computational and storage requirements. However, the inherent learning capacity gap between the teacher and student often hinders the sufficient transfer of knowledge, motivating numerous studies to address this challenge. Inspired by the progressive approximation principle in the Stone-Weierstrass theorem, we propose expandable residual approximation (ERA), a novel KD method that decomposes the approximation of residual knowledge into multiple steps, reducing the difficulty of mimicking the teacher's representation through a divide-and-conquer approach.

View Article and Find Full Text PDF

Similar Publications

Ground Reaction Force Estimation via Time-aware Knowledge Distillation.

IEEE Internet Things J

August 2025

Geometric Media Lab, School of Arts, Media and Engineering and School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85281 USA.

Eun Som Jeon , Sinjini Mitra , Jisoo Lee , Omik M Save , Ankita Shukla

Human gait analysis with wearable sensors has been widely used in various applications, such as daily life healthcare, rehabilitation, physical therapy, and clinical diagnostics and monitoring. In particular, ground reaction force (GRF) provides critical information about how the body interacts with the ground during locomotion. Although instrumented treadmills have been widely used as the gold standard for measuring GRF during walking, their lack of portability and high cost make them impractical for many applications.

View Article and Find Full Text PDF

Similar Publications

4D trajectory lightweight prediction algorithm based on knowledge distillation technique.

Front Neurorobot

August 2025

College of Air Traffic Management, Civil Aviation Flight University of China, Chengdu, China.

Weizhen Tang , Jie Dai , Zhousheng Huang , Boyang Hao , Weizheng Xie

Introduction: To address the challenges of current 4D trajectory prediction-specifically, limited multi-factor feature extraction and excessive computational cost-this study develops a lightweight prediction framework tailored for real-time air-traffic management.

Methods: We propose a hybrid RCBAM-TCN-LSTM architecture enhanced with a teacher-student knowledge distillation mechanism. The Residual Convolutional Block Attention Module (RCBAM) serves as the teacher network to extract high-dimensional spatial features via residual structures and channel-spatial attention.

View Article and Find Full Text PDF

Similar Publications