Related papers: Guaranteed Quantization Error Computation for Neur…

Model compression as constrained optimization, with application to neural nets. Part II: quantization

We consider the problem of deep neural net compression by quantization: given a large, reference net, we want to quantize its real-valued weights using a codebook with $K$ entries so that the training loss of the quantized net is minimal.…

Machine Learning · Computer Science 2017-07-17 Miguel Á. Carreira-Perpiñán , Yerlan Idelbayev

Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization

Quantization is a widely used technique to compress and accelerate deep neural networks. However, conventional quantization methods use the same bit-width for all (or most of) the layers, which often suffer significant accuracy degradation…

Computer Vision and Pattern Recognition · Computer Science 2021-10-14 Weihan Chen , Peisong Wang , Jian Cheng

Differentiable Fine-grained Quantization for Deep Neural Network Compression

Neural networks have shown great performance in cognitive tasks. When deploying network models on mobile devices with limited resources, weight quantization has been widely adopted. Binary quantization obtains the highest compression but…

Computer Vision and Pattern Recognition · Computer Science 2018-11-14 Hsin-Pai Cheng , Yuanjun Huang , Xuyang Guo , Yifei Huang , Feng Yan , Hai Li , Yiran Chen

Adaptive Quantization for Deep Neural Network

In recent years Deep Neural Networks (DNNs) have been rapidly developed in various applications, together with increasingly complex architectures. The performance gain of these DNNs generally comes with high computational costs and large…

Machine Learning · Computer Science 2017-12-05 Yiren Zhou , Seyed-Mohsen Moosavi-Dezfooli , Ngai-Man Cheung , Pascal Frossard

Model compression as constrained optimization, with application to neural nets. Part V: combining compressions

Model compression is generally performed by using quantization, low-rank approximation or pruning, for which various algorithms have been researched in recent years. One fundamental question is: what types of compression work better for a…

Machine Learning · Computer Science 2021-07-12 Miguel Á. Carreira-Perpiñán , Yerlan Idelbayev

Neural Image Compression with Quantization Rectifier

Neural image compression has been shown to outperform traditional image codecs in terms of rate-distortion performance. However, quantization introduces errors in the compression process, which can degrade the quality of the compressed…

Machine Learning · Computer Science 2024-03-27 Wei Luo , Bo Chen

Mixed-Precision Neural Networks: A Survey

Mixed-precision Deep Neural Networks achieve the energy efficiency and throughput needed for hardware deployment, particularly when the resources are limited, without sacrificing accuracy. However, the optimal per-layer bit precision that…

Machine Learning · Computer Science 2022-08-15 Mariam Rakka , Mohammed E. Fouda , Pramod Khargonekar , Fadi Kurdahi

Optimizing Deep Neural Networks using Safety-Guided Self Compression

The deployment of deep neural networks on resource-constrained devices necessitates effective model com- pression strategies that judiciously balance the reduction of model size with the preservation of performance. This study introduces a…

Machine Learning · Computer Science 2025-05-02 Mohammad Zbeeb , Mariam Salman , Mohammad Bazzi , Ammar Mohanna

Pruning and Quantization for Deep Neural Network Acceleration: A Survey

Deep neural networks have been applied in many applications exhibiting extraordinary abilities in the field of computer vision. However, complex network architectures challenge efficient real-time deployment and require significant…

Computer Vision and Pattern Recognition · Computer Science 2021-06-16 Tailin Liang , John Glossner , Lei Wang , Shaobo Shi , Xiaotong Zhang

Safety Verification of Neural Network Control Systems Using Guaranteed Neural Network Model Reduction

This paper aims to enhance the computational efficiency of safety verification of neural network control systems by developing a guaranteed neural network model reduction method. First, a concept of model reduction precision is proposed to…

Machine Learning · Computer Science 2023-01-19 Weiming Xiang , Zhongzhu Shao

On the Universal Approximability and Complexity Bounds of Quantized ReLU Neural Networks

Compression is a key step to deploy large neural networks on resource-constrained platforms. As a popular compression technique, quantization constrains the number of distinct weight values and thus reducing the number of bits required to…

Machine Learning · Computer Science 2019-01-15 Yukun Ding , Jinglan Liu , Jinjun Xiong , Yiyu Shi

Robust error bounds for quantised and pruned neural networks

With the rise of smartphones and the internet-of-things, data is increasingly getting generated at the edge on local, personal devices. For privacy, latency and energy saving reasons, this shift is causing machine learning algorithms to…

Machine Learning · Computer Science 2021-04-29 Jiaqi Li , Ross Drummond , Stephen R. Duncan

Compression Repair for Feedforward Neural Networks Based on Model Equivalence Evaluation

In this paper, we propose a method of repairing compressed Feedforward Neural Networks (FNNs) based on equivalence evaluation of two neural networks. In the repairing framework, a novel neural network equivalence evaluation method is…

Machine Learning · Computer Science 2024-02-20 Zihao Mo , Yejiang Yang , Shuaizheng Lu , Weiming Xiang

Towards Optimal Compression: Joint Pruning and Quantization

Model compression is instrumental in optimizing deep neural network inference on resource-constrained hardware. The prevailing methods for network compression, namely quantization and pruning, have been shown to enhance efficiency at the…

Machine Learning · Computer Science 2023-06-13 Ben Zandonati , Glenn Bucagu , Adrian Alan Pol , Maurizio Pierini , Olya Sirkin , Tal Kopetz

Mixed-Precision Inference Quantization: Radically Towards Faster inference speed, Lower Storage requirement, and Lower Loss

Based on the model's resilience to computational noise, model quantization is important for compressing models and improving computing speed. Existing quantization techniques rely heavily on experience and "fine-tuning" skills. In the…

Machine Learning · Computer Science 2022-07-22 Daning Cheng , Wenguang Chen

Weight Normalization based Quantization for Deep Neural Network Compression

With the development of deep neural networks, the size of network models becomes larger and larger. Model compression has become an urgent need for deploying these network models to mobile or embedded devices. Model quantization is a…

Machine Learning · Computer Science 2019-07-02 Wen-Pu Cai , Wu-Jun Li

Reducing Computational Complexity of Neural Networks in Optical Channel Equalization: From Concepts to Implementation

In this paper, a new methodology is proposed that allows for the low-complexity development of neural network (NN) based equalizers for the mitigation of impairments in high-speed coherent optical transmission systems. In this work, we…

Signal Processing · Electrical Eng. & Systems 2023-08-02 Pedro J. Freire , Antonio Napoli , Diego Arguello Ron , Bernhard Spinnler , Michael Anderson , Wolfgang Schairer , Thomas Bex , Nelson Costa , Sergei K. Turitsyn , Jaroslaw E. Prilepsky

A Survey of Quantization Methods for Efficient Neural Network Inference

As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related…

Computer Vision and Pattern Recognition · Computer Science 2021-06-23 Amir Gholami , Sehoon Kim , Zhen Dong , Zhewei Yao , Michael W. Mahoney , Kurt Keutzer

Towards the Limit of Network Quantization

Network quantization is one of network compression techniques to reduce the redundancy of deep neural networks. It reduces the number of distinct network parameter values by quantization in order to save the storage for them. In this paper,…

Computer Vision and Pattern Recognition · Computer Science 2017-11-15 Yoojin Choi , Mostafa El-Khamy , Jungwon Lee

Towards Efficient Verification of Quantized Neural Networks

Quantization replaces floating point arithmetic with integer arithmetic in deep neural network models, providing more efficient on-device inference with less power and memory. In this work, we propose a framework for formally verifying…

Machine Learning · Computer Science 2023-12-29 Pei Huang , Haoze Wu , Yuting Yang , Ieva Daukantas , Min Wu , Yedi Zhang , Clark Barrett