English
Related papers

Related papers: EfQAT: An Efficient Framework for Quantization-Awa…

200 papers

Post-Training Quantization (PTQ) reduces the memory footprint and computational overhead of deep neural networks by converting full-precision (FP) values into quantized and compressed data types. While PTQ is more cost-efficient than…

Computer Vision and Pattern Recognition · Computer Science 2025-10-08 Ali Zoljodi , Radu Timofte , Masoud Daneshtalab

Quantization-aware training (QAT) is a leading technique for improving the accuracy of quantized neural networks. Previous work has shown that decomposing training into a full-precision (FP) phase followed by a QAT phase yields superior…

Machine Learning · Computer Science 2026-02-27 Aleksandr Dremov , David Grangier , Angelos Katharopoulos , Awni Hannun

Hybrid models that combine convolutional and transformer blocks offer strong performance in computer vision (CV) tasks but are resource-intensive for edge deployment. Although post-training quantization (PTQ) can help reduce resource…

Computer Vision and Pattern Recognition · Computer Science 2025-06-16 Shaibal Saha , Lanyu Xu

Quantization-aware training (QAT) is a common paradigm for network quantization, in which the training phase incorporates the simulation of the low-precision computation to optimize the quantization parameters in alignment with the task…

Machine Learning · Computer Science 2024-12-23 Chengting Yu , Shu Yang , Fengzhao Zhang , Hanzhi Ma , Aili Wang , Er-Ping Li

Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) represent two mainstream model quantization approaches. However, PTQ often leads to unacceptable performance degradation in quantized models, while QAT imposes…

Computer Vision and Pattern Recognition · Computer Science 2025-08-18 Xinhao Wang , Zhiwei Lin , Zhongyu Xia , Yongtao Wang

Reasoning models excel at complex tasks such as coding and mathematics, yet their inference is often slow and token-inefficient. To improve the inference efficiency, post-training quantization (PTQ) usually comes with the cost of large…

Machine Learning · Computer Science 2026-01-22 Keyu Lv , Manyi Zhang , Xiaobo Xia , Jingchen Ni , Shannan Yan , Xianzhi Yu , Lu Hou , Chun Yuan , Haoli Bai

The 8 bits quantization has been widely applied to accelerate network inference in various deep learning applications. There are two kinds of quantization methods, training-based quantization and post-training quantization. Training-based…

Computer Vision and Pattern Recognition · Computer Science 2020-07-01 Di Wu , Qi Tang , Yongle Zhao , Ming Zhang , Ying Fu , Debing Zhang

This study explores the quantisation-aware training (QAT) on time series Transformer models. We propose a novel adaptive quantisation scheme that dynamically selects between symmetric and asymmetric schemes during the QAT phase. Our…

Machine Learning · Computer Science 2023-10-05 Tianheng Ling , Chao Qian , Lukas Einhaus , Gregor Schiele

Quantization is an effective technique to reduce memory footprint, inference latency, and power consumption of deep learning models. However, existing quantization methods suffer from accuracy degradation compared to full-precision (FP)…

Machine Learning · Computer Science 2022-10-14 Zheng Wang , Juncheng B Li , Shuhui Qu , Florian Metze , Emma Strubell

Quantization-aware training (QAT) is essential for deploying large models under strict memory and latency constraints, yet achieving stable and robust optimization at ultra-low bitwidths remains challenging. Common approaches based on the…

Machine Learning · Computer Science 2026-02-19 Tianyi Chen , Sihan Chen , Xiaoyi Qu , Dan Zhao , Ruomei Yan , Jongwoo Ko , Luming Liang , Pashmina Cameron

The post-training quantization (PTQ) challenge of bringing quantized neural net accuracy close to original has drawn much attention driven by industry demand. Many of the methods emphasize optimization of a specific degree-of-freedom (DoF),…

Machine Learning · Statistics 2023-03-21 Alex Finkelstein , Ella Fuchs , Idan Tal , Mark Grobman , Niv Vosco , Eldad Meller

Quantization is an effective technique to reduce the deployment cost of large language models (LLMs), and post-training quantization (PTQ) has been widely studied due to its efficiency. However, existing PTQ methods are limited by their…

Machine Learning · Computer Science 2025-09-30 Qitao Tan , Xiaoying Song , Jin Lu , Guoming Li , Jun Liu , Lingzi Hong , Caiwen Ding , Jundong Li , Xiaoming Zhai , Shaoyi Huang , Wei Niu , Geng Yuan

Large language models (LLMs) are crucial in modern natural language processing and artificial intelligence. However, they face challenges in managing their significant memory requirements. Although quantization-aware training (QAT) offers a…

Machine Learning · Computer Science 2025-05-20 Mengzhao Chen , Wenqi Shao , Peng Xu , Jiahao Wang , Peng Gao , Kaipeng Zhang , Ping Luo

Model quantization reduces neural network parameter precision to achieve compression, but often compromises accuracy. Existing post-training quantization (PTQ) methods employ iterative parameter updates to preserve accuracy under high…

Computer Vision and Pattern Recognition · Computer Science 2025-09-09 Zekang Zheng , Haokun Li , Yaofo Chen , Mingkui Tan , Qing Du

Post-training quantization (PTQ) is a neural network compression technique that converts a full-precision model into a quantized model using lower-precision data types. Although it can help reduce the size and computational cost of deep…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Jiawei Liu , Lin Niu , Zhihang Yuan , Dawei Yang , Xinggang Wang , Wenyu Liu

Video matting is crucial for applications such as film production and virtual reality, yet deploying its computationally intensive models on resource-constrained devices presents challenges. Quantization is a key technique for model…

Computer Vision and Pattern Recognition · Computer Science 2025-06-13 Tianrui Zhu , Houyuan Chen , Ruihao Gong , Michele Magno , Haotong Qin , Kai Zhang

Post-training quantization (PTQ) for vision transformers (ViTs) has garnered significant attention due to its efficiency in compressing models. However, existing methods typically overlook the relationship between a well-trained NN and the…

Computer Vision and Pattern Recognition · Computer Science 2025-11-04 Peng Xia , Junbiao Pang , Tianyang Cai

Efficient inference is critical for deploying deep learning models on edge AI devices. Low-bit quantization (e.g., 3- and 4-bit) with fixed-point arithmetic improves efficiency, while low-power memory technologies like analog nonvolatile…

Machine Learning · Computer Science 2025-07-15 Anmol Biswas , Raghav Singhal , Sivakumar Elangovan , Shreyas Sabnis , Udayan Ganguly

Post-training quantization (PTQ) for vision transformers (ViTs) has received increasing attention from both academic and industrial communities due to its minimal data needs and high time efficiency. However, many current methods fail to…

Computer Vision and Pattern Recognition · Computer Science 2025-02-05 Yunshan Zhong , You Huang , Jiawei Hu , Yuxin Zhang , Rongrong Ji

Fully quantized training (FQT), which uses low-bitwidth hardware by quantizing the activations, weights, and gradients of a neural network model, is a promising approach to accelerate the training of deep neural networks. One major…

Machine Learning · Computer Science 2020-10-28 Jianfei Chen , Yu Gai , Zhewei Yao , Michael W. Mahoney , Joseph E. Gonzalez
‹ Prev 1 2 3 10 Next ›