Related papers: One-Bit Quantization for Random Features Models

Retraining-Based Iterative Weight Quantization for Deep Neural Networks

Model compression has gained a lot of attention due to its ability to reduce hardware resource requirements significantly while maintaining accuracy of DNNs. Model compression is especially useful for memory-intensive recurrent neural…

Machine Learning · Computer Science 2018-05-30 Dongsoo Lee , Byeongwook Kim

On the Universal Approximability and Complexity Bounds of Quantized ReLU Neural Networks

Compression is a key step to deploy large neural networks on resource-constrained platforms. As a popular compression technique, quantization constrains the number of distinct weight values and thus reducing the number of bits required to…

Machine Learning · Computer Science 2019-01-15 Yukun Ding , Jinglan Liu , Jinjun Xiong , Yiyu Shi

One Weight Bitwidth to Rule Them All

Weight quantization for deep ConvNets has shown promising results for applications such as image classification and semantic segmentation and is especially important for applications where memory storage is limited. However, when aiming for…

Machine Learning · Computer Science 2020-09-01 Ting-Wu Chin , Pierce I-Jen Chuang , Vikas Chandra , Diana Marculescu

Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks

Compressing large neural networks is an important step for their deployment in resource-constrained computational platforms. In this context, vector quantization is an appealing framework that expresses multiple parameters using a single…

Computer Vision and Pattern Recognition · Computer Science 2021-04-13 Julieta Martinez , Jashan Shewakramani , Ting Wei Liu , Ioan Andrei Bârsan , Wenyuan Zeng , Raquel Urtasun

QGen: On the Ability to Generalize in Quantization Aware Training

Quantization lowers memory usage, computational requirements, and latency by utilizing fewer bits to represent model weights and activations. In this work, we investigate the generalization properties of quantized neural networks, a…

Machine Learning · Computer Science 2024-04-22 MohammadHossein AskariHemmat , Ahmadreza Jeddi , Reyhane Askari Hemmat , Ivan Lazarevich , Alexander Hoffman , Sudhakar Sah , Ehsan Saboori , Yvon Savaria , Jean-Pierre David

Bit Efficient Quantization for Deep Neural Networks

Quantization for deep neural networks have afforded models for edge devices that use less on-board memory and enable efficient low-power inference. In this paper, we present a comparison of model-parameter driven quantization approaches…

Computer Vision and Pattern Recognition · Computer Science 2019-10-14 Prateeth Nayak , David Zhang , Sek Chai

Mixed-Precision Quantized Neural Network with Progressively Decreasing Bitwidth For Image Classification and Object Detection

Efficient model inference is an important and practical issue in the deployment of deep neural network on resource constraint platforms. Network quantization addresses this problem effectively by leveraging low-bit representation and…

Computer Vision and Pattern Recognition · Computer Science 2020-01-01 Tianshu Chu , Qin Luo , Jie Yang , Xiaolin Huang

A simple approach for quantizing neural networks

In this short note, we propose a new method for quantizing the weights of a fully trained neural network. A simple deterministic pre-processing step allows us to quantize network layers via memoryless scalar quantization while preserving…

Machine Learning · Computer Science 2023-04-06 Johannes Maly , Rayan Saab

Fixed-point Quantization of Convolutional Neural Networks for Quantized Inference on Embedded Platforms

Convolutional Neural Networks (CNNs) have proven to be a powerful state-of-the-art method for image classification tasks. One drawback however is the high computational complexity and high memory consumption of CNNs which makes them…

Computer Vision and Pattern Recognition · Computer Science 2021-02-04 Rishabh Goyal , Joaquin Vanschoren , Victor van Acht , Stephan Nijssen

OneBit: Towards Extremely Low-bit Large Language Models

Model quantification uses low bit-width values to represent the weight matrices of existing models to be quantized, which is a promising approach to reduce both storage and computational overheads of deploying highly anticipated LLMs.…

Computation and Language · Computer Science 2024-12-02 Yuzhuang Xu , Xu Han , Zonghan Yang , Shuo Wang , Qingfu Zhu , Zhiyuan Liu , Weidong Liu , Wanxiang Che

On Quantizing Implicit Neural Representations

The role of quantization within implicit/coordinate neural networks is still not fully understood. We note that using a canonical fixed quantization scheme during training produces poor performance at low-rates due to the network weight…

Computer Vision and Pattern Recognition · Computer Science 2024-10-14 Cameron Gordon , Shin-Fang Chng , Lachlan MacDonald , Simon Lucey

Data-Free Quantization Through Weight Equalization and Bias Correction

We introduce a data-free quantization method for deep neural networks that does not require fine-tuning or hyperparameter selection. It achieves near-original model performance on common computer vision architectures and tasks. 8-bit…

Machine Learning · Computer Science 2019-11-26 Markus Nagel , Mart van Baalen , Tijmen Blankevoort , Max Welling

Adaptive Quantization for Deep Neural Network

In recent years Deep Neural Networks (DNNs) have been rapidly developed in various applications, together with increasingly complex architectures. The performance gain of these DNNs generally comes with high computational costs and large…

Machine Learning · Computer Science 2017-12-05 Yiren Zhou , Seyed-Mohsen Moosavi-Dezfooli , Ngai-Man Cheung , Pascal Frossard

Quantized Neural Networks for Low-Precision Accumulation with Guaranteed Overflow Avoidance

We introduce a quantization-aware training algorithm that guarantees avoiding numerical overflow when reducing the precision of accumulators during inference. We leverage weight normalization as a means of constraining parameters during…

Machine Learning · Computer Science 2023-02-01 Ian Colbert , Alessandro Pappalardo , Jakoba Petri-Koenig

Post-training Quantization for Neural Networks with Provable Guarantees

While neural networks have been remarkably successful in a wide array of applications, implementing them in resource-constrained hardware remains an area of intense research. By replacing the weights of a neural network with quantized…

Machine Learning · Computer Science 2023-01-18 Jinjie Zhang , Yixuan Zhou , Rayan Saab

Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorization

Quantization of neural networks has become common practice, driven by the need for efficient implementations of deep neural networks on embedded devices. In this paper, we exploit an oft-overlooked degree of freedom in most networks - for a…

Machine Learning · Computer Science 2019-02-07 Eldad Meller , Alexander Finkelstein , Uri Almog , Mark Grobman

Differentiable, Bit-shifting, and Scalable Quantization without training neural network from scratch

Quantization of neural networks provides benefits of inference in less compute and memory requirements. Previous work in quantization lack two important aspects which this work provides. First almost all previous work in quantization used a…

Computer Vision and Pattern Recognition · Computer Science 2025-12-12 Zia Badar

Quantization Networks

Although deep neural networks are highly effective, their high computational and memory costs severely challenge their applications on portable devices. As a consequence, low-bit quantization, which converts a full-precision neural network…

Computer Vision and Pattern Recognition · Computer Science 2019-12-02 Jiwei Yang , Xu Shen , Jun Xing , Xinmei Tian , Houqiang Li , Bing Deng , Jianqiang Huang , Xiansheng Hua

Searching for Low-Bit Weights in Quantized Neural Networks

Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators. However, the quantization functions used in most conventional quantization methods are non-differentiable, which increases the…

Computer Vision and Pattern Recognition · Computer Science 2020-09-21 Zhaohui Yang , Yunhe Wang , Kai Han , Chunjing Xu , Chao Xu , Dacheng Tao , Chang Xu

Least squares binary quantization of neural networks

Quantizing weights and activations of deep neural networks results in significant improvement in inference efficiency at the cost of lower accuracy. A source of the accuracy gap between full precision and quantized models is the…

Machine Learning · Computer Science 2020-06-16 Hadi Pouransari , Zhucheng Tu , Oncel Tuzel