Related papers: Transform Quantization for CNN (Convolutional Neur…

Post-training 4-bit quantization of convolution networks for rapid-deployment

Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources. Neural network quantization has significant benefits in reducing the amount of…

Computer Vision and Pattern Recognition · Computer Science 2019-05-30 Ron Banner , Yury Nahshan , Elad Hoffer , Daniel Soudry

Fixed-point Quantization of Convolutional Neural Networks for Quantized Inference on Embedded Platforms

Convolutional Neural Networks (CNNs) have proven to be a powerful state-of-the-art method for image classification tasks. One drawback however is the high computational complexity and high memory consumption of CNNs which makes them…

Computer Vision and Pattern Recognition · Computer Science 2021-02-04 Rishabh Goyal , Joaquin Vanschoren , Victor van Acht , Stephan Nijssen

Compact representations of convolutional neural networks via weight pruning and quantization

The state-of-the-art performance for several real-world problems is currently reached by convolutional neural networks (CNN). Such learning models exploit recent results in the field of deep learning, typically leading to highly performing,…

Machine Learning · Computer Science 2021-08-31 Giosuè Cataldo Marinò , Alessandro Petrini , Dario Malchiodi , Marco Frasca

FQ-Conv: Fully Quantized Convolution for Efficient and Accurate Inference

Deep neural networks (DNNs) can be made hardware-efficient by reducing the numerical precision of the weights and activations of the network and by improving the network's resilience to noise. However, this gain in efficiency often comes at…

Machine Learning · Computer Science 2019-12-20 Bram-Ernst Verhoef , Nathan Laubeuf , Stefan Cosemans , Peter Debacker , Ioannis Papistas , Arindam Mallik , Diederik Verkest

Frequency-Aware Re-Parameterization for Over-Fitting Based Image Compression

Over-fitting-based image compression requires weights compactness for compression and fast convergence for practical use, posing challenges for deep convolutional neural networks (CNNs) based methods. This paper presents a simple…

Image and Video Processing · Electrical Eng. & Systems 2023-10-13 Yun Ye , Yanjie Pan , Qually Jiang , Ming Lu , Xiaoran Fang , Beryl Xu

A Data-Free Analytical Quantization Scheme for Deep Learning Models

Despite the success of CNN models on a variety of Image classification and segmentation tasks, their extensive computational and storage demands pose considerable challenges for real-world deployment on resource-constrained devices.…

Computer Vision and Pattern Recognition · Computer Science 2025-09-10 Ahmed Luqman , Khuzemah Qazi , Murray Patterson , Malik Jahan Khan , Imdadullah Khan

CTMQ: Cyclic Training of Convolutional Neural Networks with Multiple Quantization Steps

This paper proposes a training method having multiple cyclic training for achieving enhanced performance in low-bit quantized convolutional neural networks (CNNs). Quantization is a popular method for obtaining lightweight CNNs, where the…

Computer Vision and Pattern Recognition · Computer Science 2022-06-28 HyunJin Kim , Jungwoo Shin , Alberto A. Del Barrio

Focused Quantization for Sparse CNNs

Deep convolutional neural networks (CNNs) are powerful tools for a wide range of vision tasks, but the enormous amount of memory and compute resources required by CNNs pose a challenge in deploying them on constrained devices. Existing…

Machine Learning · Computer Science 2019-10-30 Yiren Zhao , Xitong Gao , Daniel Bates , Robert Mullins , Cheng-Zhong Xu

Exploiting Weight Redundancy in CNNs: Beyond Pruning and Quantization

Pruning and quantization are proven methods for improving the performance and storage efficiency of convolutional neural networks (CNNs). Pruning removes near-zero weights in tensors and masks weak connections between neurons in…

Machine Learning · Computer Science 2020-06-23 Yuan Wen , David Gregg

Post-Training Quantization for Vision Transformer

Recently, transformer has achieved remarkable performance on a variety of computer vision applications. Compared with mainstream convolutional neural networks, vision transformers are often of sophisticated architectures for extracting…

Computer Vision and Pattern Recognition · Computer Science 2021-06-29 Zhenhua Liu , Yunhe Wang , Kai Han , Siwei Ma , Wen Gao

UWC: Unit-wise Calibration Towards Rapid Network Compression

This paper introduces a post-training quantization~(PTQ) method achieving highly efficient Convolutional Neural Network~ (CNN) quantization with high performance. Previous PTQ methods usually reduce compression error via performing…

Computer Vision and Pattern Recognition · Computer Science 2022-01-19 Chen Lin , Zheyang Li , Bo Peng , Haoji Hu , Wenming Tan , Ye Ren , Shiliang Pu

Kernel Quantization for Efficient Network Compression

This paper presents a novel network compression framework Kernel Quantization (KQ), targeting to efficiently convert any pre-trained full-precision convolutional neural network (CNN) model into a low-precision version without significant…

Machine Learning · Computer Science 2020-03-12 Zhongzhi Yu , Yemin Shi , Tiejun Huang , Yizhou Yu

Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks

Compressing large neural networks is an important step for their deployment in resource-constrained computational platforms. In this context, vector quantization is an appealing framework that expresses multiple parameters using a single…

Computer Vision and Pattern Recognition · Computer Science 2021-04-13 Julieta Martinez , Jashan Shewakramani , Ting Wei Liu , Ioan Andrei Bârsan , Wenyuan Zeng , Raquel Urtasun

Retraining-Based Iterative Weight Quantization for Deep Neural Networks

Model compression has gained a lot of attention due to its ability to reduce hardware resource requirements significantly while maintaining accuracy of DNNs. Model compression is especially useful for memory-intensive recurrent neural…

Machine Learning · Computer Science 2018-05-30 Dongsoo Lee , Byeongwook Kim

Compression strategies and space-conscious representations for deep neural networks

Recent advances in deep learning have made available large, powerful convolutional neural networks (CNN) with state-of-the-art performance in several real-world applications. Unfortunately, these large-sized models have millions of…

Machine Learning · Computer Science 2020-07-17 Giosuè Cataldo Marinò , Gregorio Ghidoli , Marco Frasca , Dario Malchiodi

PTQ-SL: Exploring the Sub-layerwise Post-training Quantization

Network quantization is a powerful technique to compress convolutional neural networks. The quantization granularity determines how to share the scaling factors in weights, which affects the performance of network quantization. Most…

Computer Vision and Pattern Recognition · Computer Science 2021-10-19 Zhihang Yuan , Yiqi Chen , Chenhao Xue , Chenguang Zhang , Qiankun Wang , Guangyu Sun

Quantization of Deep Neural Networks for Accurate Edge Computing

Deep neural networks (DNNs) have demonstrated their great potential in recent years, exceeding the per-formance of human experts in a wide range of applications. Due to their large sizes, however, compressiontechniques such as weight…

Computer Vision and Pattern Recognition · Computer Science 2021-10-15 Wentao Chen , Hailong Qiu , Jian Zhuang , Chutong Zhang , Yu Hu , Qing Lu , Tianchen Wang , Yiyu Shi , Meiping Huang , Xiaowe Xu

Model compression as constrained optimization, with application to neural nets. Part II: quantization

We consider the problem of deep neural net compression by quantization: given a large, reference net, we want to quantize its real-valued weights using a codebook with $K$ entries so that the training loss of the quantized net is minimal.…

Machine Learning · Computer Science 2017-07-17 Miguel Á. Carreira-Perpiñán , Yerlan Idelbayev

Post-Training Quantization for Energy Efficient Realization of Deep Neural Networks

The biggest challenge for the deployment of Deep Neural Networks (DNNs) close to the generated data on edge devices is their size, i.e., memory footprint and computational complexity. Both are significantly reduced with quantization. With…

Machine Learning · Computer Science 2022-10-17 Cecilia Latotzke , Batuhan Balim , Tobias Gemmeke

Transform-Based Feature Map Compression for CNN Inference

To achieve higher accuracy in machine learning tasks, very deep convolutional neural networks (CNNs) are designed recently. However, the large memory access of deep CNNs will lead to high power consumption. A variety of hardware-friendly…

Image and Video Processing · Electrical Eng. & Systems 2021-06-25 Yubo Shi , Meiqi Wang , Siyi Chen , Jinghe Wei , Zhongfeng Wang