98%
921
2 minutes
20
Due to its efficiency, Post-Training Quantization (PTQ) has been widely adopted for compressing Vision Transformers (ViTs). However, when quantized into low-bit representations, there is often a significant performance drop compared to their full-precision counterparts. To address this issue, reconstruction methods have been incorporated into the PTQ framework to improve performance in low-bit quantization settings. Nevertheless, existing related methods apply the single and fixed reconstruction granularity and seldom explore the progressive relationships between different reconstruction granularities, which leads to sub-optimal quantization results in ViTs. To this end, in this paper, we propose a Progressive Fine-to-Coarse Reconstruction (PFCR) method for accurate PTQ, which significantly improves the performance of low-bit quantized vision transformers. Specifically, we define multi-head self-attention and multi-layer perceptron modules along with their shortcuts as the finest reconstruction units. After reconstructing these two fine-grained units, we combine them to form coarser blocks and reconstruct them at a coarser granularity level. We iteratively perform this combination and reconstruction process, achieving progressive fine-to-coarse reconstruction. Additionally, we introduce a Progressive Optimization Strategy (POS) for PFCR to alleviate the difficulty of training, thereby further enhancing model performance. Experimental results on the ImageNet dataset demonstrate that our proposed method achieves the best Top-1 accuracy among state-of-the-art methods, particularly attaining 75.61% for 3-bit quantized ViT-B in PTQ. Besides, quantization results on the COCO dataset reveal the effectiveness and generalization of our proposed method on other computer vision tasks like object detection and instance segmentation.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.neunet.2025.107558 | DOI Listing |
IEEE Trans Neural Netw Learn Syst
August 2025
Recent advances in deep-learning-based remote sensing image super-resolution (RSISR) have garnered significant attention. Conventional models typically perform upsampling at the end of the architecture, which reduces computational effort but leads to information loss and limits image quality. Moreover, the structural complexity and texture diversity of remote sensing images pose challenges in detail preservation.
View Article and Find Full Text PDFInterdiscip Sci
August 2025
UTSEUS, Shanghai University, Shanghai, 200444, China.
The segmentation of brain tumor magnetic resonance imaging (MRI) plays a crucial role in assisting diagnosis, treatment planning, and disease progression evaluation. Convolutional neural networks (CNNs) and transformer-based methods have achieved significant progress due to their local and global feature extraction capabilities. However, similar to other medical image segmentation tasks, challenges remain in addressing issues such as blurred boundaries, small lesion volumes, and interwoven regions.
View Article and Find Full Text PDFNeural Netw
September 2025
School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, 401331, China. Electronic address:
Due to its efficiency, Post-Training Quantization (PTQ) has been widely adopted for compressing Vision Transformers (ViTs). However, when quantized into low-bit representations, there is often a significant performance drop compared to their full-precision counterparts. To address this issue, reconstruction methods have been incorporated into the PTQ framework to improve performance in low-bit quantization settings.
View Article and Find Full Text PDFCereb Cortex
April 2022
Université de Lorraine, CNRS, CRAN, 54000 Nancy, France.
At what level of spatial resolution can the human brain recognize a familiar face in a crowd of strangers? Does it depend on whether one approaches or rather moves back from the crowd? To answer these questions, 16 observers viewed different unsegmented images of unfamiliar faces alternating at 6 Hz, with spatial frequency (SF) content progressively increasing (i.e., coarse-to-fine) or decreasing (fine-to-coarse) in different sequences.
View Article and Find Full Text PDFCancer Biomark
November 2020
Department of Radiation Oncology, Sichuan Cancer Hospital and Institution, Sichuan Cancer Center, School of Medicine, University of Electronic Science and Technology of China, Radiation Oncology Key Laboratory of Sichuan, Chengdu, Sichuan, China.
Introduction: To study the relationship between the tumor heterogeneity based on CT and overall survival (OS) in oesophageal squamous cell carcinoma treated with chemotherapy and radiation therapy (CRT).
Methods: Fifty-seventh clinical patients who underwent definitive CRT were analyzed. The results were analyzed in terms of whole-tumor texture, with quantification of entropy, mean gray-level intensity for fine to coarse textures (filters 1.