DHE: A Semantic-Preserving Framework for Robust Post-Training Quantization of Vision Transformers

Xuze Mao; Yu Chen

doi:10.54963/dtra.v5i1.1932

Authors

Xuze Mao
College of Information and Network Security, Yunnan Police College, Kunming 650223, China
Yu Chen
College of Information and Network Security, Yunnan Police College, Kunming 650223, China

Received: 25 November 2025; Revised: 13 January 2026; Accepted: 23 January 2026; Published: 3 March 2026

Abstract:

Although Vision Transformers (ViTs) have demonstrated impressive achievement in computer vision, they suffer from considerable flaws in deployment due to high computational and memory costs. Post-training quantization (PTQ) is an effective compression method but can cause severe accuracy failure of ViTs, which is mainly caused by the disturbance of attention mechanisms. Based on the scheme of fixed threshold suggested by Zhenhua Liu et al., this paper addresses this drawback by introducing a Dynamic Hybrid Enhancement (DHE) scheme, which changes the quantization paradigm of numerical reconstruction to that of semantic preservation. The main innovations are: the dynamic mechanism of adjusting the loss of rankings by dynamically moving the attention through distribution; the sensitivity of the weight matrix to differences, which prioritizes the semantically important attention connections; and the multi-head normalization strategy, which optimizes attention heads. Numerous experiments on CIFAR-10 and CIFAR-100 show that DHE has accuracy rates of 67.22% and 37.62% compared to the baseline PTQ model (i.e., the fixed-threshold method by Zhenhua Liu et al.), which is 1.82 and 2.23. The role of every single component is confirmed with the help of ablation studies, and the performance of semantic preservation through attention visualization and quantitative measures (e.g., Attention Ranking Preservation Rate, ARPR = 94.7% on CIFAR-10, ARPR = 90.1% on CIFAR-100) proves the superiority of the suggested pattern to the traditional ones.

Keywords:

Image Classification Vision Transformers Post-Training Quantization Precision Quantization

References

Li, Y.; Ma, Z.; Wang, Y.; et al. Survey of Vision Transformers (ViT). Comput. Sci. 2025, 52, 194–209. DOI: https://doi.org/10.11896/jsjkx.240600135 (in Chinese)
Guo, M.-H.; Xu, T.-X.; Liu, J.-J.; et al. Attention Mechanisms in Computer Vision: A Survey. Comput. Vis. Media 2022, 8, 331–368. DOI: https://doi.org/10.1007/s41095-022-0271-y
Vaishnav, M.; Cadene, R.; Alamia, A.; et al. Understanding the Computational Demands Underlying Visual Reasoning. Neural Comput. 2022, 34, 1075–1099. DOI: https://doi.org/10.1162/neco_a_01485
Liu, Z.; Wang, Y.; Han, K.; et al. Post-Training Quantization for Vision Transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 28092–28103.
Wang, P.; Chen, Q.; He, X.; et al. Optimization-Based Post-Training Quantization With Bit-Split and Stitching. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 2119–2135. DOI: https://doi.org/10.1109/TPAMI.2022.3159369
Jiang, Y.; Sun, N.; Xie, X.; et al. ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers. Neural Netw. 2025, 186, 107289. DOI: https://doi.org/10.1016/j.neunet.2025.107289
Nguyen, P.T.Q.; Khanh, T.C.; Ergu, Y.A.; et al. Q-Drop: Optimizing Quantum Orthogonal Networks with Statistic Pruning and Dynamic Dropout. In Proceedings of the IEEE International Conference on Communications, Montreal, QC, Canada, 8–12 June 2025; pp. 2394–2399. DOI: https://doi.org/10.1109/ICC52391.2025.11161668
Zhong, Y.; Zhou, Y.; Chao, F.; et al. MBQuant: A Novel Multi-Branch Topology Method for Arbitrary Bit-Width Network Quantization. Pattern Recognit. 2025, 158, 111061. DOI: https://doi.org/10.1016/j.patcog.2024.111061
Fang, J.; Shafiee, A.; Abdel-Aziz, H.; et al. Post-Training Piecewise Linear Quantization for Deep Neural Networks. In Computer Vision – ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., et al., Eds.; Springer: Cham, Switzerland, 2020; 12347, pp. 66–89. DOI: https://doi.org/10.1007/978-3-030-58536-5_5
Nagel, M.; Baalen, M.; Blankevoort, T.; et al. Data-Free Quantization through Weight Equalization and Bias Correction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, South Korea, 27 October–2 November 2019; pp. 1325–1334. DOI: https://doi.org/10.1109/ICCV.2019.00141
Fan, H.; Cheng, S.; de Nazelle, A.J.; et al. An Efficient ViT-Based Spatial Interpolation Learner for Field Reconstruction. In Computational Science – ICCS 2023; Mikyška, J., de Mulatier, C., Paszynski, M., et al., Eds.; Springer: Cham, Switzerland, 2023; 10476, pp. 430–437. DOI: https://doi.org/10.1007/978-3-031-36027-5_34
Fan, H.; Cheng, S.; de Nazelle, A.J.; et al. ViTAE-SL: A Vision Transformer-Based Autoencoder and Spatial Interpolation Learner for Field Reconstruction. Comput. Phys. Commun. 2025, 308, 109464. DOI: https://doi.org/10.1016/j.cpc.2024.109464
Zhang, S.; Han, Q.; Wang, H.; et al. Federated Learning with Dual Dynamic Quantization Optimization in Smart Agriculture. Internet Things 2025, 101798. DOI: https://doi.org/10.1016/j.iot.2025.101798
Han, S.; Zhou, W.; Lu, J.; et al. NROWAN-DQN: A Stable Noisy Network with Noise Reduction and Online Weight Adjustment for Exploration. Expert Syst. Appl. 2022, 203, 117343. DOI: https://doi.org/10.1016/j.eswa.2022.117343
Xiao, G.; Lin, J.; Seznec, M.; et al. SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models. arXiv preprint 2022, arXiv:2211.10438. DOI: https://doi.org/10.48550/arXiv.2211.10438
Jiang, R.; Zhang, Y.; Wang, L.; et al. AIQViT: Architecture-Informed Post-Training Quantization for Vision Transformers. Proc. AAAI Conf. Artif. Intell. 2025, 39, 17635–17643. DOI: https://doi.org/10.1609/aaai.v39i17.33939
Patro, B.N.; Namboodiri, V.P.; Agneeswaran, V.S. SpectFormer: Frequency and Attention Is What You Need in a Vision Transformer. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Tucson, AZ, USA, 26 Feburary–6 March 2025. DOI: https://doi.org/10.1109/WACV61041.2025.00924
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; et al. An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale. arXiv preprint 2020, arXiv:2010.11929. DOI: https://doi.org/10.48550/arXiv.2010.11929
Liu, Y.; Wu, Y.H.; Sun, G.; et al. Vision Transformers with Hierarchical Attention. Mach. Intell. Res. 2024, 21, 670–683. DOI: https://doi.org/10.1007/s11633-024-1393-8
Li, Y.; Wang, J.; Dai, X.; et al. How Does Attention Work in Vision Transformers? A Visual Analytics Attempt. IEEE Trans. Vis. Comput. Graph. 2023, 29, 2888–2900. DOI: https://doi.org/10.1109/TVCG.2023.3261935
Qin, H.; Zhou, D.; Xu, T.; et al. Factorization Vision Transformer: Modeling Long-Range Dependency with Local Window Cost. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 3151–3164. DOI: https://doi.org/10.1109/TNNLS.2023.3342172
Chen, X.; Zhao, L.; Zou, D. How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression. Adv. Neural Inf. Process. Syst. 2024, 37, 119573–119613. DOI: https://doi.org/10.52202/079017-3799
Zhang, Y.; Xu, B.; Zhao, T. Convolutional Multi-Head Self-Attention on Memory for Aspect Sentiment Classification. IEEE/CAA J. Autom. Sin. 2020, 7, 1038–1044. DOI: https://doi.org/10.1109/JAS.2020.1003243
Han, D.; Pu, Y.; Xia, Z.; et al. Bridging the Divide: Reconsidering Softmax and Linear Attention. Adv. Neural Inf. Process. Syst. 2024, 37, 79221–79245. DOI: https://doi.org/10.52202/079017-2515
Yao, Z.; Yazdani Aminabadi, R.; Zhang, M.; et al. ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers. In Proceedings of the 36th Conference on Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022; pp. 27168–27183. Available online: https://dl.acm.org/doi/10.5555/3600270.3602240
Wu, D.; Wang, Y.; Fei, Y.; et al. A Novel Mixed-Precision Quantization Approach for CNNs. IEEE Access 2025, 13, 49309–49319. DOI: https://doi.org/10.1109/ACCESS.2025.3551802
Ramzi, E.; Audebert, N.; Rambour, C.; et al. Optimization of Rank Losses for Image Retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 4317–4329. DOI: https://doi.org/10.1109/TPAMI.2025.3543846
Shi, H.; Cheng, X.; Mao, W.; et al. P2-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer. IEEE Trans. Very Large Scale Integr. Syst. 2024, 32, 1704–1717. DOI: https://doi.org/10.1109/TVLSI.2024.3422684

Digital Technologies Research and Applications

Article

DHE: A Semantic-Preserving Framework for Robust Post-Training Quantization of Vision Transformers

Downloads

Authors

Keywords:

References