Related papers: Neural Network-based Quantization for Network Auto…

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

Hardware-friendly network quantization (e.g., binary/uniform quantization) can efficiently accelerate the inference and meanwhile reduce memory consumption of the deep neural networks, which is crucial for model deployment on…

Computer Vision and Pattern Recognition · Computer Science 2019-08-15 Ruihao Gong , Xianglong Liu , Shenghu Jiang , Tianxiang Li , Peng Hu , Jiazhen Lin , Fengwei Yu , Junjie Yan

BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization

Mixed-precision quantization can potentially achieve the optimal tradeoff between performance and compression rate of deep neural networks, and thus, have been widely investigated. However, it lacks a systematic method to determine the…

Machine Learning · Computer Science 2021-02-23 Huanrui Yang , Lin Duan , Yiran Chen , Hai Li

DNQ: Dynamic Network Quantization

Network quantization is an effective method for the deployment of neural networks on memory and energy constrained mobile devices. In this paper, we propose a Dynamic Network Quantization (DNQ) framework which is composed of two modules: a…

Machine Learning · Computer Science 2018-12-07 Yuhui Xu , Shuai Zhang , Yingyong Qi , Jiaxian Guo , Weiyao Lin , Hongkai Xiong

MSQ: Memory-Efficient Bit Sparsification Quantization

As deep neural networks (DNNs) see increased deployment on mobile and edge devices, optimizing model efficiency has become crucial. Mixed-precision quantization is widely favored, as it offers a superior balance between efficiency and…

Machine Learning · Computer Science 2025-07-31 Seokho Han , Seoyeon Yoon , Jinhee Kim , Dongwei Wang , Kang Eun Jeon , Huanrui Yang , Jong Hwan Ko

Deep Spherical Quantization for Image Search

Hashing methods, which encode high-dimensional images with compact discrete codes, have been widely applied to enhance large-scale image retrieval. In this paper, we put forward Deep Spherical Quantization (DSQ), a novel method to make deep…

Computer Vision and Pattern Recognition · Computer Science 2019-06-10 Sepehr Eghbali , Ladan Tahvildari

Deep Neural Network Compression with Single and Multiple Level Quantization

Network quantization is an effective solution to compress deep neural networks for practical usage. Existing network quantization methods cannot sufficiently exploit the depth information to generate low-bit compressed network. In this…

Machine Learning · Computer Science 2018-12-18 Yuhui Xu , Yongzhuang Wang , Aojun Zhou , Weiyao Lin , Hongkai Xiong

Learnable Companding Quantization for Accurate Low-bit Neural Networks

Quantizing deep neural networks is an effective method for reducing memory consumption and improving inference speed, and is thus useful for implementation in resource-constrained devices. However, it is still hard for extremely low-bit…

Computer Vision and Pattern Recognition · Computer Science 2021-11-03 Kohei Yamamoto

Uncertainty-Preserving QBNNs: Multi-Level Quantization of SVI-Based Bayesian Neural Networks for Image Classification

Bayesian Neural Networks (BNNs) provide principled uncertainty quantification but suffer from substantial computational and memory overhead compared to deterministic networks. While quantization techniques have successfully reduced resource…

Machine Learning · Computer Science 2025-12-12 Hendrik Borras , Yong Wu , Bernhard Klein , Holger Fröning

Constraint Guided Model Quantization of Neural Networks

Deploying neural networks on the edge has become increasingly important as deep learning is being applied in an increasing amount of applications. At the edge computing hardware typically has limited resources disallowing to run neural…

Machine Learning · Computer Science 2025-09-15 Quinten Van Baelen , Peter Karsmakers

CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level Continuous Sparsification

Mixed-precision quantization has been widely applied on deep neural networks (DNNs) as it leads to significantly better efficiency-accuracy tradeoffs compared to uniform quantization. Meanwhile, determining the exact precision of each layer…

Computer Vision and Pattern Recognition · Computer Science 2023-03-01 Lirui Xiao , Huanrui Yang , Zhen Dong , Kurt Keutzer , Li Du , Shanghang Zhang

SQS: Bayesian DNN Compression through Sparse Quantized Sub-distributions

Compressing large-scale neural networks is essential for deploying models on resource-constrained devices. Most existing methods adopt weight pruning or low-bit quantization individually, often resulting in suboptimal compression rates to…

Machine Learning · Computer Science 2025-10-13 Ziyi Wang , Nan Jiang , Guang Lin , Qifan Song

Hardware-friendly Deep Learning by Network Quantization and Binarization

Quantization is emerging as an efficient approach to promote hardware-friendly deep learning and run deep neural networks on resource-limited hardware. However, it still causes a significant decrease to the network in accuracy. We summarize…

Machine Learning · Computer Science 2021-12-03 Haotong Qin

DBQ: A Differentiable Branch Quantizer for Lightweight Deep Neural Networks

Deep neural networks have achieved state-of-the art performance on various computer vision tasks. However, their deployment on resource-constrained devices has been hindered due to their high computational and storage complexity. While…

Computer Vision and Pattern Recognition · Computer Science 2020-07-21 Hassan Dbouk , Hetul Sanghvi , Mahesh Mehendale , Naresh Shanbhag

Standard Deviation-Based Quantization for Deep Neural Networks

Quantization of deep neural networks is a promising approach that reduces the inference cost, making it feasible to run deep networks on resource-restricted devices. Inspired by existing methods, we propose a new framework to learn the…

Machine Learning · Computer Science 2022-02-28 Amir Ardakani , Arash Ardakani , Brett Meyer , James J. Clark , Warren J. Gross

Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization

Low-bit deep neural networks (DNNs) become critical for embedded applications due to their low storage requirement and computing efficiency. However, they suffer much from the non-negligible accuracy drop. This paper proposes the stochastic…

Computer Vision and Pattern Recognition · Computer Science 2017-08-04 Yinpeng Dong , Renkun Ni , Jianguo Li , Yurong Chen , Jun Zhu , Hang Su

A White Paper on Neural Network Quantization

While neural networks have advanced the frontiers in many applications, they often come at a high computational cost. Reducing the power and latency of neural network inference is key if we want to integrate modern networks into edge…

Machine Learning · Computer Science 2021-06-16 Markus Nagel , Marios Fournarakis , Rana Ali Amjad , Yelysei Bondarenko , Mart van Baalen , Tijmen Blankevoort

SPFQ: A Stochastic Algorithm and Its Error Analysis for Neural Network Quantization

Quantization is a widely used compression method that effectively reduces redundancies in over-parameterized neural networks. However, existing quantization techniques for deep neural networks often lack a comprehensive error analysis due…

Machine Learning · Computer Science 2023-09-21 Jinjie Zhang , Rayan Saab

ECQ$^{\text{x}}$: Explainability-Driven Quantization for Low-Bit and Sparse DNNs

The remarkable success of deep neural networks (DNNs) in various applications is accompanied by a significant increase in network parameters and arithmetic operations. Such increases in memory and computational demands make deep learning…

Machine Learning · Computer Science 2024-06-07 Daniel Becking , Maximilian Dreyer , Wojciech Samek , Karsten Müller , Sebastian Lapuschkin

Adaptive Quantization for Deep Neural Network

In recent years Deep Neural Networks (DNNs) have been rapidly developed in various applications, together with increasingly complex architectures. The performance gain of these DNNs generally comes with high computational costs and large…

Machine Learning · Computer Science 2017-12-05 Yiren Zhou , Seyed-Mohsen Moosavi-Dezfooli , Ngai-Man Cheung , Pascal Frossard

RepQ: Generalizing Quantization-Aware Training for Re-Parametrized Architectures

Existing neural networks are memory-consuming and computationally intensive, making deploying them challenging in resource-constrained environments. However, there are various methods to improve their efficiency. Two such methods are…

Machine Learning · Computer Science 2023-11-10 Anastasiia Prutianova , Alexey Zaytsev , Chung-Kuei Lee , Fengyu Sun , Ivan Koryakovskiy