Related papers: Bayesian Bits: Unifying Quantization and Pruning

A Practical Mixed Precision Algorithm for Post-Training Quantization

Neural network quantization is frequently used to optimize model size, latency and power consumption for on-device deployment of neural networks. In many cases, a target bit-width is set for an entire network, meaning every layer get…

Machine Learning · Computer Science 2023-02-13 Nilesh Prasad Pandey , Markus Nagel , Mart van Baalen , Yin Huang , Chirag Patel , Tijmen Blankevoort

Efficient Bitwidth Search for Practical Mixed Precision Neural Network

Network quantization has rapidly become one of the most widely used methods to compress and accelerate deep neural networks. Recent efforts propose to quantize weights and activations from different layers with different precision to…

Machine Learning · Computer Science 2020-03-18 Yuhang Li , Wei Wang , Haoli Bai , Ruihao Gong , Xin Dong , Fengwei Yu

Single-path Bit Sharing for Automatic Loss-aware Model Compression

Network pruning and quantization are proven to be effective ways for deep model compression. To obtain a highly compact model, most methods first perform network pruning and then conduct network quantization based on the pruned model.…

Computer Vision and Pattern Recognition · Computer Science 2023-05-05 Jing Liu , Bohan Zhuang , Peng Chen , Chunhua Shen , Jianfei Cai , Mingkui Tan

FracBits: Mixed Precision Quantization via Fractional Bit-Widths

Model quantization helps to reduce model size and latency of deep neural networks. Mixed precision quantization is favorable with customized hardwares supporting arithmetic operations at multiple bit-widths to achieve maximum efficiency. We…

Computer Vision and Pattern Recognition · Computer Science 2020-12-04 Linjie Yang , Qing Jin

Bayesian Compression for Deep Learning

Compression and computational efficiency in deep learning have become a problem of great significance. In this work, we argue that the most principled and effective way to attack this problem is by adopting a Bayesian point of view, where…

Machine Learning · Statistics 2017-11-07 Christos Louizos , Karen Ullrich , Max Welling

MixQuant: Mixed Precision Quantization with a Bit-width Optimization Search

Quantization is a technique for creating efficient Deep Neural Networks (DNNs), which involves performing computations and storing tensors at lower bit-widths than f32 floating point precision. Quantization reduces model size and inference…

Machine Learning · Computer Science 2023-10-02 Eliska Kloberdanz , Wei Le

SQS: Bayesian DNN Compression through Sparse Quantized Sub-distributions

Compressing large-scale neural networks is essential for deploying models on resource-constrained devices. Most existing methods adopt weight pruning or low-bit quantization individually, often resulting in suboptimal compression rates to…

Machine Learning · Computer Science 2025-10-13 Ziyi Wang , Nan Jiang , Guang Lin , Qifan Song

Pruning a neural network using Bayesian inference

Neural network pruning is a highly effective technique aimed at reducing the computational and memory demands of large neural networks. In this research paper, we present a novel approach to pruning neural networks utilizing Bayesian…

Machine Learning · Statistics 2023-08-07 Sunil Mathew , Daniel B. Rowe

BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization

Mixed-precision quantization can potentially achieve the optimal tradeoff between performance and compression rate of deep neural networks, and thus, have been widely investigated. However, it lacks a systematic method to determine the…

Machine Learning · Computer Science 2021-02-23 Huanrui Yang , Lin Duan , Yiran Chen , Hai Li

Differentiable Joint Pruning and Quantization for Hardware Efficiency

We present a differentiable joint pruning and quantization (DJPQ) scheme. We frame neural network compression as a joint gradient-based optimization problem, trading off between model pruning and quantization automatically for hardware…

Machine Learning · Computer Science 2021-04-06 Ying Wang , Yadong Lu , Tijmen Blankevoort

Least squares binary quantization of neural networks

Quantizing weights and activations of deep neural networks results in significant improvement in inference efficiency at the cost of lower accuracy. A source of the accuracy gap between full precision and quantized models is the…

Machine Learning · Computer Science 2020-06-16 Hadi Pouransari , Zhucheng Tu , Oncel Tuzel

Bayesian iterative screening in ultra-high dimensional linear regressions

Variable selection in ultra-high dimensional linear regression is often preceded by a screening step to significantly reduce the dimension. Here we develop a Bayesian variable screening method (BITS) guided by the posterior model…

Methodology · Statistics 2025-02-28 Run Wang , Somak Dutta , Vivekananda Roy

Where and How to Enhance: Discovering Bit-Width Contribution for Mixed Precision Quantization

Mixed precision quantization (MPQ) is an effective quantization approach to achieve accuracy-complexity trade-off of neural network, through assigning different bit-widths to network activations and weights in each layer. The typical way of…

Machine Learning · Computer Science 2025-08-06 Haidong Kang , Lianbo Ma , Guo Yu , Shangce Gao

Low-bit Quantization of Neural Networks for Efficient Inference

Recent machine learning methods use increasingly large deep neural networks to achieve state of the art results in various tasks. The gains in performance come at the cost of a substantial increase in computation and storage requirements.…

Machine Learning · Computer Science 2019-03-26 Yoni Choukroun , Eli Kravchik , Fan Yang , Pavel Kisilev

On Resource-Efficient Bayesian Network Classifiers and Deep Neural Networks

We present two methods to reduce the complexity of Bayesian network (BN) classifiers. First, we introduce quantization-aware training using the straight-through gradient estimator to quantize the parameters of BNs to few bits. Second, we…

Machine Learning · Computer Science 2021-09-23 Wolfgang Roth , Günther Schindler , Holger Fröning , Franz Pernkopf

Principled Pruning of Bayesian Neural Networks through Variational Free Energy Minimization

Bayesian model reduction provides an efficient approach for comparing the performance of all nested sub-models of a model, without re-evaluating any of these sub-models. Until now, Bayesian model reduction has been applied mainly in the…

Machine Learning · Computer Science 2024-10-15 Jim Beckers , Bart van Erp , Ziyue Zhao , Kirill Kondrashov , Bert de Vries

Efficient Multi-bit Quantization Network Training via Weight Bias Correction and Bit-wise Coreset Sampling

Multi-bit quantization networks enable flexible deployment of deep neural networks by supporting multiple precision levels within a single model. However, existing approaches suffer from significant training overhead as full-dataset updates…

Computer Vision and Pattern Recognition · Computer Science 2025-10-24 Jinhee Kim , Jae Jun An , Kang Eun Jeon , Jong Hwan Ko

Quantized Neural Network Inference with Precision Batching

We present PrecisionBatching, a quantized inference algorithm for speeding up neural network execution on traditional hardware platforms at low bitwidths without the need for retraining or recalibration. PrecisionBatching decomposes a…

Machine Learning · Computer Science 2020-03-03 Maximilian Lam , Zachary Yedidia , Colby Banbury , Vijay Janapa Reddi

Automatic Pruning for Quantized Neural Networks

Neural network quantization and pruning are two techniques commonly used to reduce the computational complexity and memory footprint of these models for deployment. However, most existing pruning strategies operate on full-precision and…

Computer Vision and Pattern Recognition · Computer Science 2020-02-04 Luis Guerra , Bohan Zhuang , Ian Reid , Tom Drummond

Bit-Mixer: Mixed-precision networks with runtime bit-width selection

Mixed-precision networks allow for a variable bit-width quantization for every layer in the network. A major limitation of existing work is that the bit-width for each layer must be predefined during training time. This allows little…

Machine Learning · Computer Science 2021-04-01 Adrian Bulat , Georgios Tzimiropoulos