English
Related papers

Related papers: AdaQAT: Adaptive Bit-Width Quantization-Aware Trai…

200 papers

Quantization-Aware Training (QAT) is a critical technique for deploying deep neural networks on resource-constrained devices. However, existing methods often face two major challenges: the highly non-uniform distribution of activations and…

Computer Vision and Pattern Recognition · Computer Science 2025-10-23 Shaohang Jia , Zhiyong Huang , Zhi Yu , Mingyang Hou , Shuai Miao , Han Yang

Deep neural network quantization with adaptive bitwidths has gained increasing attention due to the ease of model deployment on various platforms with different resource budgets. In this paper, we propose a meta-learning approach to achieve…

Machine Learning · Computer Science 2022-07-22 Jiseok Youn , Jaehun Song , Hyung-Sin Kim , Saewoong Bahk

Deep neural networks with adaptive configurations have gained increasing attention due to the instant and flexible deployment of these models on platforms with different resource budgets. In this paper, we investigate a novel option to…

Computer Vision and Pattern Recognition · Computer Science 2020-03-17 Qing Jin , Linjie Yang , Zhenyu Liao

The deployment of deep neural networks on resource-constrained devices relies on quantization. While static, uniform quantization applies a fixed bit-width to all inputs, it fails to adapt to their varying complexity. Dynamic,…

Machine Learning · Computer Science 2026-03-24 Hazem Hesham Yousef Shalby , Fabrizio Pittorino , Francesca Palermo , Diana Trojaniello , Manuel Roveri

Quantization is a technique for reducing deep neural networks (DNNs) training and inference times, which is crucial for training in resource constrained environments or applications where inference is time critical. State-of-the-art (SOTA)…

Machine Learning · Computer Science 2023-05-24 Lorenz Kummer , Kevin Sidak , Tabea Reichmann , Wilfried Gansterer

Low-bit quantization of network weights and activations can drastically reduce the memory footprint, complexity, energy consumption and latency of Deep Neural Networks (DNNs). However, low-bit quantization can also cause a considerable drop…

Computer Vision and Pattern Recognition · Computer Science 2021-03-25 Ghouthi Boukli Hacene , Lukas Mauch , Stefan Uhlich , Fabien Cardinaux

The deployment of deep neural networks on edge devices is a challenging task due to the increasing complexity of state-of-the-art models, requiring efforts to reduce model size and inference latency. Recent studies explore models operating…

Machine Learning · Computer Science 2025-06-16 Jinhee Kim , Seoyeon Yoon , Taeho Lee , Joo Chan Lee , Kang Eun Jeon , Jong Hwan Ko

Quantization is a technique for creating efficient Deep Neural Networks (DNNs), which involves performing computations and storing tensors at lower bit-widths than f32 floating point precision. Quantization reduces model size and inference…

Machine Learning · Computer Science 2023-10-02 Eliska Kloberdanz , Wei Le

Mixed-precision quantization mostly predetermines the model bit-width settings before actual training due to the non-differential bit-width sampling process, obtaining sub-optimal performance. Worse still, the conventional static…

Artificial Intelligence · Computer Science 2023-02-10 Yingchun Wang , Jingcai Guo , Song Guo , Weizhan Zhang

Quantization Aware Training (QAT) is a neural network quantization technique that compresses model size and improves operational efficiency while effectively maintaining model performance. The paradigm of QAT is to introduce fake…

Computer Vision and Pattern Recognition · Computer Science 2025-04-25 Wenqiang Zhou , Zhendong Yu , Xinyu Liu , Jiaming Yang , Rong Xiao , Tao Wang , Chenwei Tang , Jiancheng Lv

Quantizing deep networks with adaptive bit-widths is a promising technique for efficient inference across many devices and resource constraints. In contrast to static methods that repeat the quantization process and train different models…

Computer Vision and Pattern Recognition · Computer Science 2021-09-20 Ximeng Sun , Rameswar Panda , Chun-Fu Chen , Naigang Wang , Bowen Pan , Kailash Gopalakrishnan , Aude Oliva , Rogerio Feris , Kate Saenko

Hardware-friendly network quantization (e.g., binary/uniform quantization) can efficiently accelerate the inference and meanwhile reduce memory consumption of the deep neural networks, which is crucial for model deployment on…

Computer Vision and Pattern Recognition · Computer Science 2019-08-15 Ruihao Gong , Xianglong Liu , Shenghu Jiang , Tianxiang Li , Peng Hu , Jiazhen Lin , Fengwei Yu , Junjie Yan

In recent years Deep Neural Networks (DNNs) have been rapidly developed in various applications, together with increasingly complex architectures. The performance gain of these DNNs generally comes with high computational costs and large…

Machine Learning · Computer Science 2017-12-05 Yiren Zhou , Seyed-Mohsen Moosavi-Dezfooli , Ngai-Man Cheung , Pascal Frossard

We investigate the compression of deep neural networks by quantizing their weights and activations into multiple binary bases, known as multi-bit networks (MBNs), which accelerate the inference and reduce the storage for the deployment on…

Computer Vision and Pattern Recognition · Computer Science 2020-07-07 Zhongnan Qu , Zimu Zhou , Yun Cheng , Lothar Thiele

Although weight and activation quantization is an effective approach for Deep Neural Network (DNN) compression and has a lot of potentials to increase inference speed leveraging bit-operations, there is still a noticeable gap in terms of…

Computer Vision and Pattern Recognition · Computer Science 2018-07-27 Dongqing Zhang , Jiaolong Yang , Dongqiangzi Ye , Gang Hua

The large computing and memory cost of deep neural networks (DNNs) often precludes their use in resource-constrained devices. Quantizing the parameters and operations to lower bit-precision offers substantial memory and energy savings for…

Machine Learning · Computer Science 2023-09-01 Clemens JS Schaefer , Siddharth Joshi , Shan Li , Raul Blazquez

Neural network quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation, while preserving the performance of the original…

Computer Vision and Pattern Recognition · Computer Science 2022-07-05 Geon Park , Jaehong Yoon , Haiyang Zhang , Xing Zhang , Sung Ju Hwang , Yonina C. Eldar

Deep neural networks (DNNs) are essential for performing advanced tasks on edge or mobile devices, yet their deployment is often hindered by severe resource constraints, including limited memory, energy, and computational power. While…

Machine Learning · Computer Science 2026-03-04 Qunyou Liu , Pengbo Yu , Marina Zapater , David Atienza

Quantizing neural networks is one of the most effective methods for achieving efficient inference on mobile and embedded devices. In particular, mixed precision quantized (MPQ) networks, whose layers can be quantized to different bitwidths,…

Machine Learning · Computer Science 2023-07-11 Jorn Peters , Marios Fournarakis , Markus Nagel , Mart van Baalen , Tijmen Blankevoort

Recurrent Neural Networks (RNNs) are powerful models that achieve exceptional performance on several pattern recognition problems. However, the training of RNNs is a computationally difficult task owing to the well-known…

Machine Learning · Computer Science 2016-02-25 Nitish Shirish Keskar , Albert S. Berahas
‹ Prev 1 2 3 10 Next ›