Related papers: Universal Deep Neural Network Compression

Automatic Neural Network Compression by Sparsity-Quantization Joint Learning: A Constrained Optimization-based Approach

Deep Neural Networks (DNNs) are applied in a wide range of usecases. There is an increased demand for deploying DNNs on devices that do not have abundant resources such as memory and computation units. Recently, network compression through…

Machine Learning · Computer Science 2020-05-19 Haichuan Yang , Shupeng Gui , Yuhao Zhu , Ji Liu

Neural Networks Weights Quantization: Target None-retraining Ternary (TNT)

Quantization of weights of deep neural networks (DNN) has proven to be an effective solution for the purpose of implementing DNNs on edge devices such as mobiles, ASICs and FPGAs, because they have no sufficient resources to support…

Machine Learning · Computer Science 2019-12-20 Tianyu Zhang , Lei Zhu , Qian Zhao , Kilho Shin

DP-Net: Dynamic Programming Guided Deep Neural Network Compression

In this work, we propose an effective scheme (called DP-Net) for compressing the deep neural networks (DNNs). It includes a novel dynamic programming (DP) based algorithm to obtain the optimal solution of weight quantization and an…

Machine Learning · Computer Science 2020-03-24 Dingcheng Yang , Wenjian Yu , Ao Zhou , Haoyuan Mu , Gary Yao , Xiaoyi Wang

Compression strategies and space-conscious representations for deep neural networks

Recent advances in deep learning have made available large, powerful convolutional neural networks (CNN) with state-of-the-art performance in several real-world applications. Unfortunately, these large-sized models have millions of…

Machine Learning · Computer Science 2020-07-17 Giosuè Cataldo Marinò , Gregorio Ghidoli , Marco Frasca , Dario Malchiodi

Quantization of Deep Neural Networks for Accurate Edge Computing

Deep neural networks (DNNs) have demonstrated their great potential in recent years, exceeding the per-formance of human experts in a wide range of applications. Due to their large sizes, however, compressiontechniques such as weight…

Computer Vision and Pattern Recognition · Computer Science 2021-10-15 Wentao Chen , Hailong Qiu , Jian Zhuang , Chutong Zhang , Yu Hu , Qing Lu , Tianchen Wang , Yiyu Shi , Meiping Huang , Xiaowe Xu

Weightless: Lossy Weight Encoding For Deep Neural Network Compression

The large memory requirements of deep neural networks limit their deployment and adoption on many devices. Model compression methods effectively reduce the memory requirements of these models, usually through applying transformations such…

Machine Learning · Computer Science 2017-11-15 Brandon Reagen , Udit Gupta , Robert Adolf , Michael M. Mitzenmacher , Alexander M. Rush , Gu-Yeon Wei , David Brooks

Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization

Deep Neural Networks (DNNs) have shown significant advantages in a wide variety of domains. However, DNNs are becoming computationally intensive and energy hungry at an exponential pace, while at the same time, there is a vast demand for…

Machine Learning · Computer Science 2023-12-27 Konstantinos Balaskas , Andreas Karatzas , Christos Sad , Kostas Siozios , Iraklis Anagnostopoulos , Georgios Zervakis , Jörg Henkel

Toward Extremely Low Bit and Lossless Accuracy in DNNs with Progressive ADMM

Weight quantization is one of the most important techniques of Deep Neural Networks (DNNs) model compression method. A recent work using systematic framework of DNN weight quantization with the advanced optimization algorithm ADMM…

Machine Learning · Computer Science 2019-05-03 Sheng Lin , Xiaolong Ma , Shaokai Ye , Geng Yuan , Kaisheng Ma , Yanzhi Wang

"Lossless" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach

Modern deep neural networks (DNNs) are extremely powerful; however, this comes at the price of increased depth and having more parameters per layer, making their training and inference more computationally challenging. In an attempt to…

Machine Learning · Statistics 2024-03-04 Lingyu Gu , Yongqi Du , Yuan Zhang , Di Xie , Shiliang Pu , Robert C. Qiu , Zhenyu Liao

Universal and Succinct Source Coding of Deep Neural Networks

Deep neural networks have shown incredible performance for inference tasks in a variety of domains. Unfortunately, most current deep networks are enormous cloud-based structures that require significant storage space, which limits scaling…

Information Theory · Computer Science 2020-03-10 Sourya Basu , Lav R. Varshney

A Unified DNN Weight Compression Framework Using Reweighted Optimization Methods

To address the large model size and intensive computation requirement of deep neural networks (DNNs), weight pruning techniques have been proposed and generally fall into two categories, i.e., static regularization-based pruning and dynamic…

Machine Learning · Computer Science 2020-04-14 Tianyun Zhang , Xiaolong Ma , Zheng Zhan , Shanglin Zhou , Minghai Qin , Fei Sun , Yen-Kuang Chen , Caiwen Ding , Makan Fardad , Yanzhi Wang

Deep Neural Networks Based Weight Approximation and Computation Reuse for 2-D Image Classification

Deep Neural Networks (DNNs) are computationally and memory intensive, which makes their hardware implementation a challenging task especially for resource constrained devices such as IoT nodes. To address this challenge, this paper…

Computer Vision and Pattern Recognition · Computer Science 2021-05-10 Mohammed F. Tolba , Huruy Tekle Tesfai , Hani Saleh , Baker Mohammad , Mahmoud Al-Qutayri

Automatic low-bit hybrid quantization of neural networks through meta learning

Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference, especially when deploying to edge or IoT devices with limited computation capacity and power consumption budget. The uniform bit…

Machine Learning · Computer Science 2020-04-27 Tao Wang , Junsong Wang , Chang Xu , Chao Xue

A Unified Approximation Framework for Compressing and Accelerating Deep Neural Networks

Deep neural networks (DNNs) have achieved significant success in a variety of real world applications, i.e., image classification. However, tons of parameters in the networks restrict the efficiency of neural networks due to the large model…

Machine Learning · Computer Science 2019-08-21 Yuzhe Ma , Ran Chen , Wei Li , Fanhua Shang , Wenjian Yu , Minsik Cho , Bei Yu

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

Although weight and activation quantization is an effective approach for Deep Neural Network (DNN) compression and has a lot of potentials to increase inference speed leveraging bit-operations, there is still a noticeable gap in terms of…

Computer Vision and Pattern Recognition · Computer Science 2018-07-27 Dongqing Zhang , Jiaolong Yang , Dongqiangzi Ye , Gang Hua

Towards Explaining Deep Neural Network Compression Through a Probabilistic Latent Space

Despite the impressive performance of deep neural networks (DNNs), their computational complexity and storage space consumption have led to the concept of network compression. While DNN compression techniques such as pruning and low-rank…

Machine Learning · Computer Science 2025-07-04 Mahsa Mozafari-Nia , Salimeh Yasaei Sekeh

Model compression via distillation and quantization

Deep neural networks (DNNs) continue to make significant advances, solving tasks from image classification to translation or reinforcement learning. One aspect of the field receiving considerable attention is efficiently executing deep…

Neural and Evolutionary Computing · Computer Science 2018-02-16 Antonio Polino , Razvan Pascanu , Dan Alistarh

Recurrence of Optimum for Training Weight and Activation Quantized Networks

Deep neural networks (DNNs) are quantized for efficient inference on resource-constrained platforms. However, training deep learning models with low-precision weights and activations involves a demanding optimization task, which calls for…

Machine Learning · Computer Science 2021-05-25 Ziang Long , Penghang Yin , Jack Xin

A Unified Framework of DNN Weight Pruning and Weight Clustering/Quantization Using ADMM

Many model compression techniques of Deep Neural Networks (DNNs) have been investigated, including weight pruning, weight clustering and quantization, etc. Weight pruning leverages the redundancy in the number of weights in DNNs, while…

Neural and Evolutionary Computing · Computer Science 2018-11-06 Shaokai Ye , Tianyun Zhang , Kaiqi Zhang , Jiayu Li , Jiaming Xie , Yun Liang , Sijia Liu , Xue Lin , Yanzhi Wang

A Survey on Deep Neural Network Compression: Challenges, Overview, and Solutions

Deep Neural Network (DNN) has gained unprecedented performance due to its automated feature extraction capability. This high order performance leads to significant incorporation of DNN models in different Internet of Things (IoT)…

Machine Learning · Computer Science 2020-10-09 Rahul Mishra , Hari Prabhat Gupta , Tanima Dutta