Related papers: DNN Quantization with Attention

DQA: An Efficient Method for Deep Quantization of Deep Neural Network Activations

Quantization of Deep Neural Network (DNN) activations is a commonly used technique to reduce compute and memory demands during DNN inference, which can be particularly beneficial on resource-constrained devices. To achieve high accuracy,…

Machine Learning · Computer Science 2024-12-16 Wenhao Hu , Paul Henderson , José Cano

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

Although weight and activation quantization is an effective approach for Deep Neural Network (DNN) compression and has a lot of potentials to increase inference speed leveraging bit-operations, there is still a noticeable gap in terms of…

Computer Vision and Pattern Recognition · Computer Science 2018-07-27 Dongqing Zhang , Jiaolong Yang , Dongqiangzi Ye , Gang Hua

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

Hardware-friendly network quantization (e.g., binary/uniform quantization) can efficiently accelerate the inference and meanwhile reduce memory consumption of the deep neural networks, which is crucial for model deployment on…

Computer Vision and Pattern Recognition · Computer Science 2019-08-15 Ruihao Gong , Xianglong Liu , Shenghu Jiang , Tianxiang Li , Peng Hu , Jiazhen Lin , Fengwei Yu , Junjie Yan

AdaQAT: Adaptive Bit-Width Quantization-Aware Training

Large-scale deep neural networks (DNNs) have achieved remarkable success in many application scenarios. However, high computational complexity and energy costs of modern DNNs make their deployment on edge devices challenging. Model…

Machine Learning · Computer Science 2024-04-29 Cédric Gernigon , Silviu-Ioan Filip , Olivier Sentieys , Clément Coggiola , Mickael Bruno

Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization

Low-bit deep neural networks (DNNs) become critical for embedded applications due to their low storage requirement and computing efficiency. However, they suffer much from the non-negligible accuracy drop. This paper proposes the stochastic…

Computer Vision and Pattern Recognition · Computer Science 2017-08-04 Yinpeng Dong , Renkun Ni , Jianguo Li , Yurong Chen , Jun Zhu , Hang Su

Bit Efficient Quantization for Deep Neural Networks

Quantization for deep neural networks have afforded models for edge devices that use less on-board memory and enable efficient low-power inference. In this paper, we present a comparison of model-parameter driven quantization approaches…

Computer Vision and Pattern Recognition · Computer Science 2019-10-14 Prateeth Nayak , David Zhang , Sek Chai

Post-Training Quantization for Energy Efficient Realization of Deep Neural Networks

The biggest challenge for the deployment of Deep Neural Networks (DNNs) close to the generated data on edge devices is their size, i.e., memory footprint and computational complexity. Both are significantly reduced with quantization. With…

Machine Learning · Computer Science 2022-10-17 Cecilia Latotzke , Batuhan Balim , Tobias Gemmeke

Learnable Companding Quantization for Accurate Low-bit Neural Networks

Quantizing deep neural networks is an effective method for reducing memory consumption and improving inference speed, and is thus useful for implementation in resource-constrained devices. However, it is still hard for extremely low-bit…

Computer Vision and Pattern Recognition · Computer Science 2021-11-03 Kohei Yamamoto

FQ-Conv: Fully Quantized Convolution for Efficient and Accurate Inference

Deep neural networks (DNNs) can be made hardware-efficient by reducing the numerical precision of the weights and activations of the network and by improving the network's resilience to noise. However, this gain in efficiency often comes at…

Machine Learning · Computer Science 2019-12-20 Bram-Ernst Verhoef , Nathan Laubeuf , Stefan Cosemans , Peter Debacker , Ioannis Papistas , Arindam Mallik , Diederik Verkest

A Comprehensive Survey on Model Quantization for Deep Neural Networks in Image Classification

Recent advancements in machine learning achieved by Deep Neural Networks (DNNs) have been significant. While demonstrating high accuracy, DNNs are associated with a huge number of parameters and computations, which leads to high memory…

Machine Learning · Computer Science 2023-12-20 Babak Rokh , Ali Azarpeyvand , Alireza Khanteymoori

SQWA: Stochastic Quantized Weight Averaging for Improving the Generalization Capability of Low-Precision Deep Neural Networks

Designing a deep neural network (DNN) with good generalization capability is a complex process especially when the weights are severely quantized. Model averaging is a promising approach for achieving the good generalization capability of…

Machine Learning · Computer Science 2020-02-04 Sungho Shin , Yoonho Boo , Wonyong Sung

ReLeQ: A Reinforcement Learning Approach for Deep Quantization of Neural Networks

Deep Neural Networks (DNNs) typically require massive amount of computation resource in inference tasks for computer vision applications. Quantization can significantly reduce DNN computation and storage by decreasing the bitwidth of…

Machine Learning · Computer Science 2020-04-17 Ahmed T. Elthakeb , Prannoy Pilligundla , FatemehSadat Mireshghallah , Amir Yazdanbakhsh , Hadi Esmaeilzadeh

Automatic low-bit hybrid quantization of neural networks through meta learning

Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference, especially when deploying to edge or IoT devices with limited computation capacity and power consumption budget. The uniform bit…

Machine Learning · Computer Science 2020-04-27 Tao Wang , Junsong Wang , Chang Xu , Chao Xue

Quantization of Deep Neural Networks for Accurate Edge Computing

Deep neural networks (DNNs) have demonstrated their great potential in recent years, exceeding the per-formance of human experts in a wide range of applications. Due to their large sizes, however, compressiontechniques such as weight…

Computer Vision and Pattern Recognition · Computer Science 2021-10-15 Wentao Chen , Hailong Qiu , Jian Zhuang , Chutong Zhang , Yu Hu , Qing Lu , Tianchen Wang , Yiyu Shi , Meiping Huang , Xiaowe Xu

ECQ$^{\text{x}}$: Explainability-Driven Quantization for Low-Bit and Sparse DNNs

The remarkable success of deep neural networks (DNNs) in various applications is accompanied by a significant increase in network parameters and arithmetic operations. Such increases in memory and computational demands make deep learning…

Machine Learning · Computer Science 2024-06-07 Daniel Becking , Maximilian Dreyer , Wojciech Samek , Karsten Müller , Sebastian Lapuschkin

Low Precision Neural Networks using Subband Decomposition

Large-scale deep neural networks (DNN) have been successfully used in a number of tasks from image recognition to natural language processing. They are trained using large training sets on large models, making them computationally and…

Machine Learning · Computer Science 2017-03-28 Sek Chai , Aswin Raghavan , David Zhang , Mohamed Amer , Tim Shields

Once Quantization-Aware Training: High Performance Extremely Low-bit Architecture Search

Quantization Neural Networks (QNN) have attracted a lot of attention due to their high efficiency. To enhance the quantization accuracy, prior works mainly focus on designing advanced quantization algorithms but still fail to achieve…

Computer Vision and Pattern Recognition · Computer Science 2021-09-29 Mingzhu Shen , Feng Liang , Ruihao Gong , Yuhang Li , Chuming Li , Chen Lin , Fengwei Yu , Junjie Yan , Wanli Ouyang

Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats

Quantization of the weights and activations is one of the main methods to reduce the computational footprint of Deep Neural Networks (DNNs) training. Current methods enable 4-bit quantization of the forward phase. However, this constitutes…

Machine Learning · Computer Science 2024-06-11 Brian Chmiel , Ron Banner , Elad Hoffer , Hilla Ben Yaacov , Daniel Soudry

MixQuant: Mixed Precision Quantization with a Bit-width Optimization Search

Quantization is a technique for creating efficient Deep Neural Networks (DNNs), which involves performing computations and storing tensors at lower bit-widths than f32 floating point precision. Quantization reduces model size and inference…

Machine Learning · Computer Science 2023-10-02 Eliska Kloberdanz , Wei Le

Dual Precision Quantization for Efficient and Accurate Deep Neural Networks Inference

Deep neural networks have achieved state-of-the-art results in a wide range of applications, from natural language processing and computer vision to speech recognition. However, as tasks become increasingly complex, model sizes continue to…

Computer Vision and Pattern Recognition · Computer Science 2025-05-21 Tomer Gafni , Asaf Karnieli , Yair Hanani