Related papers: Modular Quantization-Aware Training for 6D Object …

Bitwidth-Adaptive Quantization-Aware Neural Network Training: A Meta-Learning Approach

Deep neural network quantization with adaptive bitwidths has gained increasing attention due to the ease of model deployment on various platforms with different resource budgets. In this paper, we propose a meta-learning approach to achieve…

Machine Learning · Computer Science 2022-07-22 Jiseok Youn , Jaehun Song , Hyung-Sin Kim , Saewoong Bahk

Improving Quantization-aware Training of Low-Precision Network via Block Replacement on Full-Precision Counterpart

Quantization-aware training (QAT) is a common paradigm for network quantization, in which the training phase incorporates the simulation of the low-precision computation to optimize the quantization parameters in alignment with the task…

Machine Learning · Computer Science 2024-12-23 Chengting Yu , Shu Yang , Fengzhao Zhang , Hanzhi Ma , Aili Wang , Er-Ping Li

A Study of Quantisation-aware Training on Time Series Transformer Models for Resource-constrained FPGAs

This study explores the quantisation-aware training (QAT) on time series Transformer models. We propose a novel adaptive quantisation scheme that dynamically selects between symmetric and asymmetric schemes during the QAT phase. Our…

Machine Learning · Computer Science 2023-10-05 Tianheng Ling , Chao Qian , Lukas Einhaus , Gregor Schiele

Scale When Needed: Adaptive Neuron-level Mixed Precision Quantization Aware Training

Deploying deep neural networks on resource-constrained 6G edge devices demands aggressive compression with minimal accuracy loss. Quantization-Aware Training (QAT) has emerged as a leading compression approach; however, existing…

Machine Learning · Computer Science 2026-05-26 Ayush K. Varshney , Konstantinos Vandikas , Šarūnas Girdzijauskas , Adam Orucu , Aneta Vulgarakis Feljan

Regularization-based Framework for Quantization-, Fault- and Variability-Aware Training

Efficient inference is critical for deploying deep learning models on edge AI devices. Low-bit quantization (e.g., 3- and 4-bit) with fixed-point arithmetic improves efficiency, while low-power memory technologies like analog nonvolatile…

Machine Learning · Computer Science 2025-07-15 Anmol Biswas , Raghav Singhal , Sivakumar Elangovan , Shreyas Sabnis , Udayan Ganguly

Precision Neural Network Quantization via Learnable Adaptive Modules

Quantization Aware Training (QAT) is a neural network quantization technique that compresses model size and improves operational efficiency while effectively maintaining model performance. The paradigm of QAT is to introduce fake…

Computer Vision and Pattern Recognition · Computer Science 2025-04-25 Wenqiang Zhou , Zhendong Yu , Xinyu Liu , Jiaming Yang , Rong Xiao , Tao Wang , Chenwei Tang , Jiancheng Lv

PTQAT: A Hybrid Parameter-Efficient Quantization Algorithm for 3D Perception Tasks

Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) represent two mainstream model quantization approaches. However, PTQ often leads to unacceptable performance degradation in quantized models, while QAT imposes…

Computer Vision and Pattern Recognition · Computer Science 2025-08-18 Xinhao Wang , Zhiwei Lin , Zhongyu Xia , Yongtao Wang

AdaQAT: Adaptive Bit-Width Quantization-Aware Training

Large-scale deep neural networks (DNNs) have achieved remarkable success in many application scenarios. However, high computational complexity and energy costs of modern DNNs make their deployment on edge devices challenging. Model…

Machine Learning · Computer Science 2024-04-29 Cédric Gernigon , Silviu-Ioan Filip , Olivier Sentieys , Clément Coggiola , Mickael Bruno

MF-QAT: Multi-Format Quantization-Aware Training for Elastic Inference

Quantization-aware training (QAT) is typically performed for a single target numeric format, while practical deployments often need to choose numerical precision at inference time based on hardware support or runtime constraints. We study…

Machine Learning · Computer Science 2026-04-02 Zifei Xu , Sayeh Sharify , Hesham Mostafa

SQuAT: Sharpness- and Quantization-Aware Training for BERT

Quantization is an effective technique to reduce memory footprint, inference latency, and power consumption of deep learning models. However, existing quantization methods suffer from accuracy degradation compared to full-precision (FP)…

Machine Learning · Computer Science 2022-10-14 Zheng Wang , Juncheng B Li , Shuhui Qu , Florian Metze , Emma Strubell

Bridging the Training-Deployment Gap: Gated Encoding and Multi-Scale Refinement for Efficient Quantization-Aware Image Enhancement

Image enhancement models for mobile devices often struggle to balance high output quality with the fast processing speeds required by mobile hardware. While recent deep learning models can enhance low-quality mobile photos into high-quality…

Artificial Intelligence · Computer Science 2026-04-24 Dat To-Thanh , Nghia Nguyen-Trong , Hoang Vo , Hieu Bui-Minh , Tinh-Anh Nguyen-Nhu

StableQAT: Stable Quantization-Aware Training at Ultra-Low Bitwidths

Quantization-aware training (QAT) is essential for deploying large models under strict memory and latency constraints, yet achieving stable and robust optimization at ultra-low bitwidths remains challenging. Common approaches based on the…

Machine Learning · Computer Science 2026-02-19 Tianyi Chen , Sihan Chen , Xiaoyi Qu , Dan Zhao , Ruomei Yan , Jongwoo Ko , Luming Liang , Pashmina Cameron

Adaptive Distribution-aware Quantization for Mixed-Precision Neural Networks

Quantization-Aware Training (QAT) is a critical technique for deploying deep neural networks on resource-constrained devices. However, existing methods often face two major challenges: the highly non-uniform distribution of activations and…

Computer Vision and Pattern Recognition · Computer Science 2025-10-23 Shaohang Jia , Zhiyong Huang , Zhi Yu , Mingyang Hou , Shuai Miao , Han Yang

Quantization Meets OOD: Generalizable Quantization-aware Training from a Flatness Perspective

Current quantization-aware training (QAT) methods primarily focus on enhancing the performance of quantized models on in-distribution (I.D) data, while overlooking the potential performance degradation on out-of-distribution (OOD) data. In…

Computer Vision and Pattern Recognition · Computer Science 2025-09-09 Jiacheng Jiang , Yuan Meng , Chen Tang , Han Yu , Qun Li , Zhi Wang , Wenwu Zhu

Compute-Optimal Quantization-Aware Training

Quantization-aware training (QAT) is a leading technique for improving the accuracy of quantized neural networks. Previous work has shown that decomposing training into a full-precision (FP) phase followed by a QAT phase yields superior…

Machine Learning · Computer Science 2026-02-27 Aleksandr Dremov , David Grangier , Angelos Katharopoulos , Awni Hannun

Quantizing Small-Scale State-Space Models for Edge AI

State-space models (SSMs) have recently gained attention in deep learning for their ability to efficiently model long-range dependencies, making them promising candidates for edge-AI applications. In this paper, we analyze the effects of…

Machine Learning · Computer Science 2025-06-17 Leo Zhao , Tristan Torchet , Melika Payvand , Laura Kriener , Filippo Moro

EfQAT: An Efficient Framework for Quantization-Aware Training

Quantization-aware training (QAT) schemes have been shown to achieve near-full precision accuracy. They accomplish this by training a quantized model for multiple epochs. This is computationally expensive, mainly because of the full…

Machine Learning · Computer Science 2024-11-19 Saleh Ashkboos , Bram Verhoef , Torsten Hoefler , Evangelos Eleftheriou , Martino Dazzi

Once Quantization-Aware Training: High Performance Extremely Low-bit Architecture Search

Quantization Neural Networks (QNN) have attracted a lot of attention due to their high efficiency. To enhance the quantization accuracy, prior works mainly focus on designing advanced quantization algorithms but still fail to achieve…

Computer Vision and Pattern Recognition · Computer Science 2021-09-29 Mingzhu Shen , Feng Liang , Ruihao Gong , Yuhang Li , Chuming Li , Chen Lin , Fengwei Yu , Junjie Yan , Wanli Ouyang

Low-Rank Quantization-Aware Training for LLMs

Large language models (LLMs) are omnipresent, however their practical deployment is challenging due to their ever increasing computational and memory demands. Quantization is one of the most effective ways to make them more compute and…

Machine Learning · Computer Science 2024-09-04 Yelysei Bondarenko , Riccardo Del Chiaro , Markus Nagel

Efficient and Robust Quantization-aware Training via Adaptive Coreset Selection

Quantization-aware training (QAT) is a representative model compression method to reduce redundancy in weights and activations. However, most existing QAT methods require end-to-end training on the entire dataset, which suffers from long…

Machine Learning · Computer Science 2024-08-21 Xijie Huang , Zechun Liu , Shih-Yang Liu , Kwang-Ting Cheng