English
Related papers

Related papers: Modular Quantization-Aware Training for 6D Object …

200 papers

Deep neural network quantization with adaptive bitwidths has gained increasing attention due to the ease of model deployment on various platforms with different resource budgets. In this paper, we propose a meta-learning approach to achieve…

Machine Learning · Computer Science 2022-07-22 Jiseok Youn , Jaehun Song , Hyung-Sin Kim , Saewoong Bahk

Quantization-aware training (QAT) is a common paradigm for network quantization, in which the training phase incorporates the simulation of the low-precision computation to optimize the quantization parameters in alignment with the task…

Machine Learning · Computer Science 2024-12-23 Chengting Yu , Shu Yang , Fengzhao Zhang , Hanzhi Ma , Aili Wang , Er-Ping Li

This study explores the quantisation-aware training (QAT) on time series Transformer models. We propose a novel adaptive quantisation scheme that dynamically selects between symmetric and asymmetric schemes during the QAT phase. Our…

Machine Learning · Computer Science 2023-10-05 Tianheng Ling , Chao Qian , Lukas Einhaus , Gregor Schiele

Deploying deep neural networks on resource-constrained 6G edge devices demands aggressive compression with minimal accuracy loss. Quantization-Aware Training (QAT) has emerged as a leading compression approach; however, existing…

Efficient inference is critical for deploying deep learning models on edge AI devices. Low-bit quantization (e.g., 3- and 4-bit) with fixed-point arithmetic improves efficiency, while low-power memory technologies like analog nonvolatile…

Machine Learning · Computer Science 2025-07-15 Anmol Biswas , Raghav Singhal , Sivakumar Elangovan , Shreyas Sabnis , Udayan Ganguly

Quantization Aware Training (QAT) is a neural network quantization technique that compresses model size and improves operational efficiency while effectively maintaining model performance. The paradigm of QAT is to introduce fake…

Computer Vision and Pattern Recognition · Computer Science 2025-04-25 Wenqiang Zhou , Zhendong Yu , Xinyu Liu , Jiaming Yang , Rong Xiao , Tao Wang , Chenwei Tang , Jiancheng Lv

Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) represent two mainstream model quantization approaches. However, PTQ often leads to unacceptable performance degradation in quantized models, while QAT imposes…

Computer Vision and Pattern Recognition · Computer Science 2025-08-18 Xinhao Wang , Zhiwei Lin , Zhongyu Xia , Yongtao Wang

Large-scale deep neural networks (DNNs) have achieved remarkable success in many application scenarios. However, high computational complexity and energy costs of modern DNNs make their deployment on edge devices challenging. Model…

Machine Learning · Computer Science 2024-04-29 Cédric Gernigon , Silviu-Ioan Filip , Olivier Sentieys , Clément Coggiola , Mickael Bruno

Quantization-aware training (QAT) is typically performed for a single target numeric format, while practical deployments often need to choose numerical precision at inference time based on hardware support or runtime constraints. We study…

Machine Learning · Computer Science 2026-04-02 Zifei Xu , Sayeh Sharify , Hesham Mostafa

Quantization is an effective technique to reduce memory footprint, inference latency, and power consumption of deep learning models. However, existing quantization methods suffer from accuracy degradation compared to full-precision (FP)…

Machine Learning · Computer Science 2022-10-14 Zheng Wang , Juncheng B Li , Shuhui Qu , Florian Metze , Emma Strubell

Image enhancement models for mobile devices often struggle to balance high output quality with the fast processing speeds required by mobile hardware. While recent deep learning models can enhance low-quality mobile photos into high-quality…

Artificial Intelligence · Computer Science 2026-04-24 Dat To-Thanh , Nghia Nguyen-Trong , Hoang Vo , Hieu Bui-Minh , Tinh-Anh Nguyen-Nhu

Quantization-aware training (QAT) is essential for deploying large models under strict memory and latency constraints, yet achieving stable and robust optimization at ultra-low bitwidths remains challenging. Common approaches based on the…

Machine Learning · Computer Science 2026-02-19 Tianyi Chen , Sihan Chen , Xiaoyi Qu , Dan Zhao , Ruomei Yan , Jongwoo Ko , Luming Liang , Pashmina Cameron

Quantization-Aware Training (QAT) is a critical technique for deploying deep neural networks on resource-constrained devices. However, existing methods often face two major challenges: the highly non-uniform distribution of activations and…

Computer Vision and Pattern Recognition · Computer Science 2025-10-23 Shaohang Jia , Zhiyong Huang , Zhi Yu , Mingyang Hou , Shuai Miao , Han Yang

Current quantization-aware training (QAT) methods primarily focus on enhancing the performance of quantized models on in-distribution (I.D) data, while overlooking the potential performance degradation on out-of-distribution (OOD) data. In…

Computer Vision and Pattern Recognition · Computer Science 2025-09-09 Jiacheng Jiang , Yuan Meng , Chen Tang , Han Yu , Qun Li , Zhi Wang , Wenwu Zhu

Quantization-aware training (QAT) is a leading technique for improving the accuracy of quantized neural networks. Previous work has shown that decomposing training into a full-precision (FP) phase followed by a QAT phase yields superior…

Machine Learning · Computer Science 2026-02-27 Aleksandr Dremov , David Grangier , Angelos Katharopoulos , Awni Hannun

State-space models (SSMs) have recently gained attention in deep learning for their ability to efficiently model long-range dependencies, making them promising candidates for edge-AI applications. In this paper, we analyze the effects of…

Machine Learning · Computer Science 2025-06-17 Leo Zhao , Tristan Torchet , Melika Payvand , Laura Kriener , Filippo Moro

Quantization-aware training (QAT) schemes have been shown to achieve near-full precision accuracy. They accomplish this by training a quantized model for multiple epochs. This is computationally expensive, mainly because of the full…

Machine Learning · Computer Science 2024-11-19 Saleh Ashkboos , Bram Verhoef , Torsten Hoefler , Evangelos Eleftheriou , Martino Dazzi

Quantization Neural Networks (QNN) have attracted a lot of attention due to their high efficiency. To enhance the quantization accuracy, prior works mainly focus on designing advanced quantization algorithms but still fail to achieve…

Computer Vision and Pattern Recognition · Computer Science 2021-09-29 Mingzhu Shen , Feng Liang , Ruihao Gong , Yuhang Li , Chuming Li , Chen Lin , Fengwei Yu , Junjie Yan , Wanli Ouyang

Large language models (LLMs) are omnipresent, however their practical deployment is challenging due to their ever increasing computational and memory demands. Quantization is one of the most effective ways to make them more compute and…

Machine Learning · Computer Science 2024-09-04 Yelysei Bondarenko , Riccardo Del Chiaro , Markus Nagel

Quantization-aware training (QAT) is a representative model compression method to reduce redundancy in weights and activations. However, most existing QAT methods require end-to-end training on the entire dataset, which suffers from long…

Machine Learning · Computer Science 2024-08-21 Xijie Huang , Zechun Liu , Shih-Yang Liu , Kwang-Ting Cheng
‹ Prev 1 2 3 10 Next ›