Related papers: BitPruning: Learning Bitlengths for Aggressive and…

BitNet: Bit-Regularized Deep Neural Networks

We present a novel optimization strategy for training neural networks which we call "BitNet". The parameters of neural networks are usually unconstrained and have a dynamic range dispersed over all real values. Our key idea is to limit the…

Machine Learning · Computer Science 2018-11-20 Aswin Raghavan , Mohamed Amer , Sek Chai , Graham Taylor

Learning to Quantize Deep Networks by Optimizing Quantization Intervals with Task Loss

Reducing bit-widths of activations and weights of deep networks makes it efficient to compute and store them in memory, which is crucial in their deployments to resource-limited devices, such as mobile phones. However, decreasing bit-widths…

Computer Vision and Pattern Recognition · Computer Science 2018-11-26 Sangil Jung , Changyong Son , Seohyung Lee , Jinwoo Son , Youngjun Kwak , Jae-Joon Han , Sung Ju Hwang , Changkyu Choi

Automatic Pruning for Quantized Neural Networks

Neural network quantization and pruning are two techniques commonly used to reduce the computational complexity and memory footprint of these models for deployment. However, most existing pruning strategies operate on full-precision and…

Computer Vision and Pattern Recognition · Computer Science 2020-02-04 Luis Guerra , Bohan Zhuang , Ian Reid , Tom Drummond

Low-bit Quantization of Neural Networks for Efficient Inference

Recent machine learning methods use increasingly large deep neural networks to achieve state of the art results in various tasks. The gains in performance come at the cost of a substantial increase in computation and storage requirements.…

Machine Learning · Computer Science 2019-03-26 Yoni Choukroun , Eli Kravchik , Fan Yang , Pavel Kisilev

Quantisation and Pruning for Neural Network Compression and Regularisation

Deep neural networks are typically too computationally expensive to run in real-time on consumer-grade hardware and low-powered devices. In this paper, we investigate reducing the computational and memory requirements of neural networks…

Machine Learning · Computer Science 2020-01-15 Kimessha Paupamah , Steven James , Richard Klein

Hybrid Binary Networks: Optimizing for Accuracy, Efficiency and Memory

Binarization is an extreme network compression approach that provides large computational speedups along with energy and memory savings, albeit at significant accuracy costs. We investigate the question of where to binarize inputs at…

Computer Vision and Pattern Recognition · Computer Science 2018-04-12 Ameya Prabhu , Vishal Batchu , Rohit Gajawada , Sri Aurobindo Munagala , Anoop Namboodiri

Low-Precision Batch-Normalized Activations

Artificial neural networks can be trained with relatively low-precision floating-point and fixed-point arithmetic, using between one and 16 bits. Previous works have focused on relatively wide-but-shallow, feed-forward networks. We…

Neural and Evolutionary Computing · Computer Science 2017-02-28 Benjamin Graham

Low Precision RNNs: Quantizing RNNs Without Losing Accuracy

Similar to convolution neural networks, recurrent neural networks (RNNs) typically suffer from over-parameterization. Quantizing bit-widths of weights and activations results in runtime efficiency on hardware, yet it often comes at the cost…

Machine Learning · Computer Science 2017-10-30 Supriya Kapur , Asit Mishra , Debbie Marr

WrapNet: Neural Net Inference with Ultra-Low-Resolution Arithmetic

Low-resolution neural networks represent both weights and activations with few bits, drastically reducing the multiplication complexity. Nonetheless, these products are accumulated using high-resolution (typically 32-bit) additions, an…

Machine Learning · Computer Science 2020-07-28 Renkun Ni , Hong-min Chu , Oscar Castañeda , Ping-yeh Chiang , Christoph Studer , Tom Goldstein

Bit-wise Training of Neural Network Weights

We introduce an algorithm where the individual bits representing the weights of a neural network are learned. This method allows training weights with integer values on arbitrary bit-depths and naturally uncovers sparse networks, without…

Machine Learning · Computer Science 2022-02-22 Cristian Ivan

Heterogeneous Bitwidth Binarization in Convolutional Neural Networks

Recent work has shown that fast, compact low-bitwidth neural networks can be surprisingly accurate. These networks use homogeneous binarization: all parameters in each layer or (more commonly) the whole model have the same low bitwidth…

Computer Vision and Pattern Recognition · Computer Science 2018-11-02 Josh Fromm , Shwetak Patel , Matthai Philipose

Efficient Multi-bit Quantization Network Training via Weight Bias Correction and Bit-wise Coreset Sampling

Multi-bit quantization networks enable flexible deployment of deep neural networks by supporting multiple precision levels within a single model. However, existing approaches suffer from significant training overhead as full-dataset updates…

Computer Vision and Pattern Recognition · Computer Science 2025-10-24 Jinhee Kim , Jae Jun An , Kang Eun Jeon , Jong Hwan Ko

Towards Accurate and Efficient Sub-8-Bit Integer Training

Neural network training is a memory- and compute-intensive task. Quantization, which enables low-bitwidth formats in training, can significantly mitigate the workload. To reduce quantization error, recent methods have developed new data…

Machine Learning · Computer Science 2024-11-19 Wenjin Guo , Donglai Liu , Weiying Xie , Yunsong Li , Xuefei Ning , Zihan Meng , Shulin Zeng , Jie Lei , Zhenman Fang , Yu Wang

ReLeQ: A Reinforcement Learning Approach for Deep Quantization of Neural Networks

Deep Neural Networks (DNNs) typically require massive amount of computation resource in inference tasks for computer vision applications. Quantization can significantly reduce DNN computation and storage by decreasing the bitwidth of…

Machine Learning · Computer Science 2020-04-17 Ahmed T. Elthakeb , Prannoy Pilligundla , FatemehSadat Mireshghallah , Amir Yazdanbakhsh , Hadi Esmaeilzadeh

Learning Compact Neural Networks with Regularization

Proper regularization is critical for speeding up training, improving generalization performance, and learning compact models that are cost efficient. We propose and analyze regularized gradient descent algorithms for learning shallow…

Machine Learning · Computer Science 2018-06-08 Samet Oymak

FracBits: Mixed Precision Quantization via Fractional Bit-Widths

Model quantization helps to reduce model size and latency of deep neural networks. Mixed precision quantization is favorable with customized hardwares supporting arithmetic operations at multiple bit-widths to achieve maximum efficiency. We…

Computer Vision and Pattern Recognition · Computer Science 2020-12-04 Linjie Yang , Qing Jin

Mixed-Precision Quantized Neural Network with Progressively Decreasing Bitwidth For Image Classification and Object Detection

Efficient model inference is an important and practical issue in the deployment of deep neural network on resource constraint platforms. Network quantization addresses this problem effectively by leveraging low-bit representation and…

Computer Vision and Pattern Recognition · Computer Science 2020-01-01 Tianshu Chu , Qin Luo , Jie Yang , Xiaolin Huang

Quantization of Neural Network Equalizers in Optical Fiber Transmission Experiments

The quantization of neural networks for the mitigation of the nonlinear and components' distortions in dual-polarization optical fiber transmission is studied. Two low-complexity neural network equalizers are applied in three 16-QAM 34.4…

Signal Processing · Electrical Eng. & Systems 2023-10-11 Jamal Darweesh , Nelson Costa , Antonio Napoli , Bernhard Spinnler , Yves Jaouen , Mansoor Yousefi

Improved Techniques for Quantizing Deep Networks with Adaptive Bit-Widths

Quantizing deep networks with adaptive bit-widths is a promising technique for efficient inference across many devices and resource constraints. In contrast to static methods that repeat the quantization process and train different models…

Computer Vision and Pattern Recognition · Computer Science 2021-09-20 Ximeng Sun , Rameswar Panda , Chun-Fu Chen , Naigang Wang , Bowen Pan , Kailash Gopalakrishnan , Aude Oliva , Rogerio Feris , Kate Saenko

Training with Quantization Noise for Extreme Model Compression

We tackle the problem of producing compact models, maximizing their accuracy for a given model size. A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the…

Machine Learning · Computer Science 2021-03-02 Angela Fan , Pierre Stock , Benjamin Graham , Edouard Grave , Remi Gribonval , Herve Jegou , Armand Joulin