Related papers: Trainable Bitwise Soft Quantization for Input Feat…

Wideband and Entropy-Aware Deep Soft Bit Quantization

Deep learning has been recently applied to physical layer processing in digital communication systems in order to improve end-to-end performance. In this work, we introduce a novel deep learning solution for soft bit quantization across…

Signal Processing · Electrical Eng. & Systems 2021-10-20 Marius Arvinte , Jonathan I. Tamir

Mixed-Precision Quantized Neural Network with Progressively Decreasing Bitwidth For Image Classification and Object Detection

Efficient model inference is an important and practical issue in the deployment of deep neural network on resource constraint platforms. Network quantization addresses this problem effectively by leveraging low-bit representation and…

Computer Vision and Pattern Recognition · Computer Science 2020-01-01 Tianshu Chu , Qin Luo , Jie Yang , Xiaolin Huang

Differentiable Fine-grained Quantization for Deep Neural Network Compression

Neural networks have shown great performance in cognitive tasks. When deploying network models on mobile devices with limited resources, weight quantization has been widely adopted. Binary quantization obtains the highest compression but…

Computer Vision and Pattern Recognition · Computer Science 2018-11-14 Hsin-Pai Cheng , Yuanjun Huang , Xuyang Guo , Yifei Huang , Feng Yan , Hai Li , Yiran Chen

Retraining-Based Iterative Weight Quantization for Deep Neural Networks

Model compression has gained a lot of attention due to its ability to reduce hardware resource requirements significantly while maintaining accuracy of DNNs. Model compression is especially useful for memory-intensive recurrent neural…

Machine Learning · Computer Science 2018-05-30 Dongsoo Lee , Byeongwook Kim

FIT: A Metric for Model Sensitivity

Model compression is vital to the deployment of deep learning on edge devices. Low precision representations, achieved via quantization of weights and activations, can reduce inference time and memory requirements. However, quantifying and…

Machine Learning · Computer Science 2022-10-18 Ben Zandonati , Adrian Alan Pol , Maurizio Pierini , Olya Sirkin , Tal Kopetz

Fixed-point Quantization of Convolutional Neural Networks for Quantized Inference on Embedded Platforms

Convolutional Neural Networks (CNNs) have proven to be a powerful state-of-the-art method for image classification tasks. One drawback however is the high computational complexity and high memory consumption of CNNs which makes them…

Computer Vision and Pattern Recognition · Computer Science 2021-02-04 Rishabh Goyal , Joaquin Vanschoren , Victor van Acht , Stephan Nijssen

Quantization Networks

Although deep neural networks are highly effective, their high computational and memory costs severely challenge their applications on portable devices. As a consequence, low-bit quantization, which converts a full-precision neural network…

Computer Vision and Pattern Recognition · Computer Science 2019-12-02 Jiwei Yang , Xu Shen , Jun Xing , Xinmei Tian , Houqiang Li , Bing Deng , Jianqiang Huang , Xiansheng Hua

Learned Image Compression with Soft Bit-based Rate-Distortion Optimization

This paper introduces the notion of soft bits to address the rate-distortion optimization for learning-based image compression. Recent methods for such compression train an autoencoder end-to-end with an objective to strike a balance…

Image and Video Processing · Electrical Eng. & Systems 2019-05-02 David Alexandre , Chih-Peng Chang , Wen-Hsiao Peng , Hsueh-Ming Hang

Improved Techniques for Quantizing Deep Networks with Adaptive Bit-Widths

Quantizing deep networks with adaptive bit-widths is a promising technique for efficient inference across many devices and resource constraints. In contrast to static methods that repeat the quantization process and train different models…

Computer Vision and Pattern Recognition · Computer Science 2021-09-20 Ximeng Sun , Rameswar Panda , Chun-Fu Chen , Naigang Wang , Bowen Pan , Kailash Gopalakrishnan , Aude Oliva , Rogerio Feris , Kate Saenko

Soft then Hard: Rethinking the Quantization in Neural Image Compression

Quantization is one of the core components in lossy image compression. For neural image compression, end-to-end optimization requires differentiable approximations of quantization, which can generally be grouped into three categories:…

Image and Video Processing · Electrical Eng. & Systems 2024-03-26 Zongyu Guo , Zhizheng Zhang , Runsen Feng , Zhibo Chen

Training Quantized Nets: A Deeper Understanding

Currently, deep neural networks are deployed on low-power portable devices by first training a full-precision model using powerful hardware, and then deriving a corresponding low-precision model for efficient inference on such systems.…

Machine Learning · Computer Science 2017-11-15 Hao Li , Soham De , Zheng Xu , Christoph Studer , Hanan Samet , Tom Goldstein

Compression of Deep Neural Networks on the Fly

Thanks to their state-of-the-art performance, deep neural networks are increasingly used for object recognition. To achieve these results, they use millions of parameters to be trained. However, when targeting embedded applications the size…

Machine Learning · Computer Science 2016-03-21 Guillaume Soulié , Vincent Gripon , Maëlys Robert

Scalable Compression of Deep Neural Networks

Deep neural networks generally involve some layers with mil- lions of parameters, making them difficult to be deployed and updated on devices with limited resources such as mobile phones and other smart embedded systems. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2016-08-29 Xing Wang , Jie Liang

Low-bit Quantization of Neural Networks for Efficient Inference

Recent machine learning methods use increasingly large deep neural networks to achieve state of the art results in various tasks. The gains in performance come at the cost of a substantial increase in computation and storage requirements.…

Machine Learning · Computer Science 2019-03-26 Yoni Choukroun , Eli Kravchik , Fan Yang , Pavel Kisilev

Automatic low-bit hybrid quantization of neural networks through meta learning

Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference, especially when deploying to edge or IoT devices with limited computation capacity and power consumption budget. The uniform bit…

Machine Learning · Computer Science 2020-04-27 Tao Wang , Junsong Wang , Chang Xu , Chao Xue

Training with Quantization Noise for Extreme Model Compression

We tackle the problem of producing compact models, maximizing their accuracy for a given model size. A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the…

Machine Learning · Computer Science 2021-03-02 Angela Fan , Pierre Stock , Benjamin Graham , Edouard Grave , Remi Gribonval , Herve Jegou , Armand Joulin

Efficient Multi-bit Quantization Network Training via Weight Bias Correction and Bit-wise Coreset Sampling

Multi-bit quantization networks enable flexible deployment of deep neural networks by supporting multiple precision levels within a single model. However, existing approaches suffer from significant training overhead as full-dataset updates…

Computer Vision and Pattern Recognition · Computer Science 2025-10-24 Jinhee Kim , Jae Jun An , Kang Eun Jeon , Jong Hwan Ko

Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks

In the low-bit quantization field, training Binary Neural Networks (BNNs) is the extreme solution to ease the deployment of deep models on resource-constrained devices, having the lowest storage cost and significantly cheaper bit-wise…

Computer Vision and Pattern Recognition · Computer Science 2021-10-19 Yikai Wang , Yi Yang , Fuchun Sun , Anbang Yao

Quantum Pointwise Convolution: A Flexible and Scalable Approach for Neural Network Enhancement

In this study, we propose a novel architecture, the Quantum Pointwise Convolution, which incorporates pointwise convolution within a quantum neural network framework. Our approach leverages the strengths of pointwise convolution to…

Machine Learning · Computer Science 2024-12-03 An Ning , Tai-Yue Li , Nan-Yow Chen

Optimizing Deep Neural Networks using Safety-Guided Self Compression

The deployment of deep neural networks on resource-constrained devices necessitates effective model com- pression strategies that judiciously balance the reduction of model size with the preservation of performance. This study introduces a…

Machine Learning · Computer Science 2025-05-02 Mohammad Zbeeb , Mariam Salman , Mohammad Bazzi , Ammar Mohanna