DHE: A Semantic-Preserving Framework for Robust Post-Training Quantization of Vision Transformers

Digital Technologies Research and Applications

Article

DHE: A Semantic-Preserving Framework for Robust Post-Training Quantization of Vision Transformers

Mao, X., & Chen, Y. (2026). DHE: A Semantic-Preserving Framework for Robust Post-Training Quantization of Vision Transformers. Digital Technologies Research and Applications, 5(1), 245–264. https://doi.org/10.54963/dtra.v5i1.1932

Authors

  • Xuze Mao

    College of Information and Network Security, Yunnan Police College, Kunming 650223, China
  • Yu Chen

    College of Information and Network Security, Yunnan Police College, Kunming 650223, China

Received: 25 November 2025; Revised: 13 January 2026; Accepted: 23 January 2026; Published: 3 March 2026

Although Vision Transformers (ViTs) have demonstrated impressive achievement in computer vision, they suffer from considerable flaws in deployment due to high computational and memory costs. Post-training quantization (PTQ) is an effective compression method but can cause severe accuracy failure of ViTs, which is mainly caused by the disturbance of attention mechanisms. Based on the scheme of fixed threshold suggested by Zhenhua Liu et al., this paper addresses this drawback by introducing a Dynamic Hybrid Enhancement (DHE) scheme, which changes the quantization paradigm of numerical reconstruction to that of semantic preservation. The main innovations are: the dynamic mechanism of adjusting the loss of rankings by dynamically moving the attention through distribution; the sensitivity of the weight matrix to differences, which prioritizes the semantically important attention connections; and the multi-head normalization strategy, which optimizes attention heads. Numerous experiments on CIFAR-10 and CIFAR-100 show that DHE has accuracy rates of 67.22% and 37.62% compared to the baseline PTQ model (i.e., the fixed-threshold method by Zhenhua Liu et al.), which is 1.82 and 2.23. The role of every single component is confirmed with the help of ablation studies, and the performance of semantic preservation through attention visualization and quantitative measures (e.g., Attention Ranking Preservation Rate, ARPR = 94.7% on CIFAR-10, ARPR = 90.1% on CIFAR-100) proves the superiority of the suggested pattern to the traditional ones.

Keywords:

Image Classification Vision Transformers Post-Training Quantization Precision Quantization

References

  1. Li, Y.; Ma, Z.; Wang, Y.; et al. Survey of Vision Transformers (ViT). Comput. Sci. 2025, 52, 194–209. DOI: https://doi.org/10.11896/jsjkx.240600135 (in Chinese)
  2. Guo, M.-H.; Xu, T.-X.; Liu, J.-J.; et al. Attention Mechanisms in Computer Vision: A Survey. Comput. Vis. Media 2022, 8, 331–368. DOI: https://doi.org/10.1007/s41095-022-0271-y
  3. Vaishnav, M.; Cadene, R.; Alamia, A.; et al. Understanding the Computational Demands Underlying Visual Reasoning. Neural Comput. 2022, 34, 1075–1099. DOI: https://doi.org/10.1162/neco_a_01485
  4. Liu, Z.; Wang, Y.; Han, K.; et al. Post-Training Quantization for Vision Transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 28092–28103.
  5. Wang, P.; Chen, Q.; He, X.; et al. Optimization-Based Post-Training Quantization With Bit-Split and Stitching. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 2119–2135. DOI: https://doi.org/10.1109/TPAMI.2022.3159369
  6. Jiang, Y.; Sun, N.; Xie, X.; et al. ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers. Neural Netw. 2025, 186, 107289. DOI: https://doi.org/10.1016/j.neunet.2025.107289
  7. Nguyen, P.T.Q.; Khanh, T.C.; Ergu, Y.A.; et al. Q-Drop: Optimizing Quantum Orthogonal Networks with Statistic Pruning and Dynamic Dropout. In Proceedings of the IEEE International Conference on Communications, Montreal, QC, Canada, 8–12 June 2025; pp. 2394–2399. DOI: https://doi.org/10.1109/ICC52391.2025.11161668
  8. Zhong, Y.; Zhou, Y.; Chao, F.; et al. MBQuant: A Novel Multi-Branch Topology Method for Arbitrary Bit-Width Network Quantization. Pattern Recognit. 2025, 158, 111061. DOI: https://doi.org/10.1016/j.patcog.2024.111061
  9. Fang, J.; Shafiee, A.; Abdel-Aziz, H.; et al. Post-Training Piecewise Linear Quantization for Deep Neural Networks. In Computer Vision – ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., et al., Eds.; Springer: Cham, Switzerland, 2020; 12347, pp. 66–89. DOI: https://doi.org/10.1007/978-3-030-58536-5_5
  10. Nagel, M.; Baalen, M.; Blankevoort, T.; et al. Data-Free Quantization through Weight Equalization and Bias Correction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, South Korea, 27 October–2 November 2019; pp. 1325–1334. DOI: https://doi.org/10.1109/ICCV.2019.00141
  11. Fan, H.; Cheng, S.; de Nazelle, A.J.; et al. An Efficient ViT-Based Spatial Interpolation Learner for Field Reconstruction. In Computational Science – ICCS 2023; Mikyška, J., de Mulatier, C., Paszynski, M., et al., Eds.; Springer: Cham, Switzerland, 2023; 10476, pp. 430–437. DOI: https://doi.org/10.1007/978-3-031-36027-5_34
  12. Fan, H.; Cheng, S.; de Nazelle, A.J.; et al. ViTAE-SL: A Vision Transformer-Based Autoencoder and Spatial Interpolation Learner for Field Reconstruction. Comput. Phys. Commun. 2025, 308, 109464. DOI: https://doi.org/10.1016/j.cpc.2024.109464
  13. Zhang, S.; Han, Q.; Wang, H.; et al. Federated Learning with Dual Dynamic Quantization Optimization in Smart Agriculture. Internet Things 2025, 101798. DOI: https://doi.org/10.1016/j.iot.2025.101798
  14. Han, S.; Zhou, W.; Lu, J.; et al. NROWAN-DQN: A Stable Noisy Network with Noise Reduction and Online Weight Adjustment for Exploration. Expert Syst. Appl. 2022, 203, 117343. DOI: https://doi.org/10.1016/j.eswa.2022.117343
  15. Xiao, G.; Lin, J.; Seznec, M.; et al. SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models. arXiv preprint 2022, arXiv:2211.10438. DOI: https://doi.org/10.48550/arXiv.2211.10438
  16. Jiang, R.; Zhang, Y.; Wang, L.; et al. AIQViT: Architecture-Informed Post-Training Quantization for Vision Transformers. Proc. AAAI Conf. Artif. Intell. 2025, 39, 17635–17643. DOI: https://doi.org/10.1609/aaai.v39i17.33939
  17. Patro, B.N.; Namboodiri, V.P.; Agneeswaran, V.S. SpectFormer: Frequency and Attention Is What You Need in a Vision Transformer. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Tucson, AZ, USA, 26 Feburary–6 March 2025. DOI: https://doi.org/10.1109/WACV61041.2025.00924
  18. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; et al. An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale. arXiv preprint 2020, arXiv:2010.11929. DOI: https://doi.org/10.48550/arXiv.2010.11929
  19. Liu, Y.; Wu, Y.H.; Sun, G.; et al. Vision Transformers with Hierarchical Attention. Mach. Intell. Res. 2024, 21, 670–683. DOI: https://doi.org/10.1007/s11633-024-1393-8
  20. Li, Y.; Wang, J.; Dai, X.; et al. How Does Attention Work in Vision Transformers? A Visual Analytics Attempt. IEEE Trans. Vis. Comput. Graph. 2023, 29, 2888–2900. DOI: https://doi.org/10.1109/TVCG.2023.3261935
  21. Qin, H.; Zhou, D.; Xu, T.; et al. Factorization Vision Transformer: Modeling Long-Range Dependency with Local Window Cost. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 3151–3164. DOI: https://doi.org/10.1109/TNNLS.2023.3342172
  22. Chen, X.; Zhao, L.; Zou, D. How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression. Adv. Neural Inf. Process. Syst. 2024, 37, 119573–119613. DOI: https://doi.org/10.52202/079017-3799
  23. Zhang, Y.; Xu, B.; Zhao, T. Convolutional Multi-Head Self-Attention on Memory for Aspect Sentiment Classification. IEEE/CAA J. Autom. Sin. 2020, 7, 1038–1044. DOI: https://doi.org/10.1109/JAS.2020.1003243
  24. Han, D.; Pu, Y.; Xia, Z.; et al. Bridging the Divide: Reconsidering Softmax and Linear Attention. Adv. Neural Inf. Process. Syst. 2024, 37, 79221–79245. DOI: https://doi.org/10.52202/079017-2515
  25. Yao, Z.; Yazdani Aminabadi, R.; Zhang, M.; et al. ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers. In Proceedings of the 36th Conference on Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022; pp. 27168–27183. Available online: https://dl.acm.org/doi/10.5555/3600270.3602240
  26. Wu, D.; Wang, Y.; Fei, Y.; et al. A Novel Mixed-Precision Quantization Approach for CNNs. IEEE Access 2025, 13, 49309–49319. DOI: https://doi.org/10.1109/ACCESS.2025.3551802
  27. Ramzi, E.; Audebert, N.; Rambour, C.; et al. Optimization of Rank Losses for Image Retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 4317–4329. DOI: https://doi.org/10.1109/TPAMI.2025.3543846
  28. Shi, H.; Cheng, X.; Mao, W.; et al. P2-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer. IEEE Trans. Very Large Scale Integr. Syst. 2024, 32, 1704–1717. DOI: https://doi.org/10.1109/TVLSI.2024.3422684