English
Related papers

Related papers: Precision Neural Network Quantization via Learnabl…

200 papers

Quantization-Aware Training (QAT) is a critical technique for deploying deep neural networks on resource-constrained devices. However, existing methods often face two major challenges: the highly non-uniform distribution of activations and…

Computer Vision and Pattern Recognition · Computer Science 2025-10-23 Shaohang Jia , Zhiyong Huang , Zhi Yu , Mingyang Hou , Shuai Miao , Han Yang

Quantization-aware training (QAT) is a representative model compression method to reduce redundancy in weights and activations. However, most existing QAT methods require end-to-end training on the entire dataset, which suffers from long…

Machine Learning · Computer Science 2024-08-21 Xijie Huang , Zechun Liu , Shih-Yang Liu , Kwang-Ting Cheng

Quantization-aware training (QAT) is a leading technique for improving the accuracy of quantized neural networks. Previous work has shown that decomposing training into a full-precision (FP) phase followed by a QAT phase yields superior…

Machine Learning · Computer Science 2026-02-27 Aleksandr Dremov , David Grangier , Angelos Katharopoulos , Awni Hannun

Quantizing deep neural networks is an effective method for reducing memory consumption and improving inference speed, and is thus useful for implementation in resource-constrained devices. However, it is still hard for extremely low-bit…

Computer Vision and Pattern Recognition · Computer Science 2021-11-03 Kohei Yamamoto

With the rapid increase in the size of neural networks, model compression has become an important area of research. Quantization is an effective technique at decreasing the model size, memory access, and compute load of large models.…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-26 David Qiu , David Rim , Shaojin Ding , Oleg Rybakov , Yanzhang He

Large-scale deep neural networks (DNNs) have achieved remarkable success in many application scenarios. However, high computational complexity and energy costs of modern DNNs make their deployment on edge devices challenging. Model…

Machine Learning · Computer Science 2024-04-29 Cédric Gernigon , Silviu-Ioan Filip , Olivier Sentieys , Clément Coggiola , Mickael Bruno

Quantization-aware training (QAT) is a common paradigm for network quantization, in which the training phase incorporates the simulation of the low-precision computation to optimize the quantization parameters in alignment with the task…

Machine Learning · Computer Science 2024-12-23 Chengting Yu , Shu Yang , Fengzhao Zhang , Hanzhi Ma , Aili Wang , Er-Ping Li

Quantization has become a predominant approach for model compression, enabling deployment of large models trained on GPUs onto smaller form-factor devices for inference. Quantization-aware training (QAT) optimizes model parameters with…

Machine Learning · Computer Science 2022-12-13 Zheng Wang , Juncheng B Li , Shuhui Qu , Florian Metze , Emma Strubell

This study explores the quantisation-aware training (QAT) on time series Transformer models. We propose a novel adaptive quantisation scheme that dynamically selects between symmetric and asymmetric schemes during the QAT phase. Our…

Machine Learning · Computer Science 2023-10-05 Tianheng Ling , Chao Qian , Lukas Einhaus , Gregor Schiele

Weight quantization is used to deploy high-performance deep learning models on resource-limited hardware, enabling the use of low-precision integers for storage and computation. Spiking neural networks (SNNs) share the goal of enhancing…

Neural and Evolutionary Computing · Computer Science 2024-05-01 Sreyes Venkatesh , Razvan Marinescu , Jason K. Eshraghian

Deploying deep neural networks on resource-constrained 6G edge devices demands aggressive compression with minimal accuracy loss. Quantization-Aware Training (QAT) has emerged as a leading compression approach; however, existing…

At present, the quantification methods of neural network models are mainly divided into post-training quantization (PTQ) and quantization aware training (QAT). Post-training quantization only need a small part of the data to complete the…

Machine Learning · Computer Science 2022-07-08 Huabin Diao , Gongyan Li , Shaoyun Xu , Yuexing Hao

Quantization reduces computation costs of neural networks but suffers from performance degeneration. Is this accuracy drop due to the reduced capacity, or inefficient training during the quantization procedure? After looking into the…

Computer Vision and Pattern Recognition · Computer Science 2019-12-24 Qing Jin , Linjie Yang , Zhenyu Liao

Efficient inference is critical for deploying deep learning models on edge AI devices. Low-bit quantization (e.g., 3- and 4-bit) with fixed-point arithmetic improves efficiency, while low-power memory technologies like analog nonvolatile…

Machine Learning · Computer Science 2025-07-15 Anmol Biswas , Raghav Singhal , Sivakumar Elangovan , Shreyas Sabnis , Udayan Ganguly

While neural networks have advanced the frontiers in many applications, they often come at a high computational cost. Reducing the power and latency of neural network inference is key if we want to integrate modern networks into edge…

Machine Learning · Computer Science 2021-06-16 Markus Nagel , Marios Fournarakis , Rana Ali Amjad , Yelysei Bondarenko , Mart van Baalen , Tijmen Blankevoort

Quantization is a technique for reducing deep neural networks (DNNs) training and inference times, which is crucial for training in resource constrained environments or applications where inference is time critical. State-of-the-art (SOTA)…

Machine Learning · Computer Science 2023-05-24 Lorenz Kummer , Kevin Sidak , Tabea Reichmann , Wilfried Gansterer

Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) represent two mainstream model quantization approaches. However, PTQ often leads to unacceptable performance degradation in quantized models, while QAT imposes…

Computer Vision and Pattern Recognition · Computer Science 2025-08-18 Xinhao Wang , Zhiwei Lin , Zhongyu Xia , Yongtao Wang

We propose Additive Powers-of-Two~(APoT) quantization, an efficient non-uniform quantization scheme for the bell-shaped and long-tailed distribution of weights and activations in neural networks. By constraining all quantization levels as…

Machine Learning · Computer Science 2020-02-04 Yuhang Li , Xin Dong , Wei Wang

Post-training quantization (PTQ) is a neural network compression technique that converts a full-precision model into a quantized model using lower-precision data types. Although it can help reduce the size and computational cost of deep…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Jiawei Liu , Lin Niu , Zhihang Yuan , Dawei Yang , Xinggang Wang , Wenyu Liu

Reasoning models excel at complex tasks such as coding and mathematics, yet their inference is often slow and token-inefficient. To improve the inference efficiency, post-training quantization (PTQ) usually comes with the cost of large…

Machine Learning · Computer Science 2026-01-22 Keyu Lv , Manyi Zhang , Xiaobo Xia , Jingchen Ni , Shannan Yan , Xianzhi Yu , Lu Hou , Chun Yuan , Haoli Bai
‹ Prev 1 2 3 10 Next ›