Related papers: Structured Compression by Weight Encryption for Un…

Quantized Sparse Weight Decomposition for Neural Network Compression

In this paper, we introduce a novel method of neural network weight compression. In our method, we store weight tensors as sparse, quantized matrix factors, whose product is computed on the fly during inference to generate the target…

Machine Learning · Computer Science 2022-07-25 Andrey Kuzmin , Mart van Baalen , Markus Nagel , Arash Behboodi

Compression strategies and space-conscious representations for deep neural networks

Recent advances in deep learning have made available large, powerful convolutional neural networks (CNN) with state-of-the-art performance in several real-world applications. Unfortunately, these large-sized models have millions of…

Machine Learning · Computer Science 2020-07-17 Giosuè Cataldo Marinò , Gregorio Ghidoli , Marco Frasca , Dario Malchiodi

Tight Compression: Compressing CNN Through Fine-Grained Pruning and Weight Permutation for Efficient Implementation

The unstructured sparsity after pruning poses a challenge to the efficient implementation of deep learning models in existing regular architectures like systolic arrays. On the other hand, coarse-grained structured pruning is suitable for…

Machine Learning · Computer Science 2024-11-22 Xizi Chen , Jingyang Zhu , Jingbo Jiang , Chi-Ying Tsui

Encoding Weights of Irregular Sparsity for Fixed-to-Fixed Model Compression

Even though fine-grained pruning techniques achieve a high compression ratio, conventional sparsity representations (such as CSR) associated with irregular sparsity degrade parallelism significantly. Practical pruning methods, thus, usually…

Machine Learning · Computer Science 2022-02-01 Baeseong Park , Se Jung Kwon , Daehwan Oh , Byeongwook Kim , Dongsoo Lee

Low-Rank+Sparse Tensor Compression for Neural Networks

Low-rank tensor compression has been proposed as a promising approach to reduce the memory and compute requirements of neural networks for their deployment on edge devices. Tensor compression reduces the number of parameters required to…

Machine Learning · Computer Science 2021-11-03 Cole Hawkins , Haichuan Yang , Meng Li , Liangzhen Lai , Vikas Chandra

FleXOR: Trainable Fractional Quantization

Quantization based on the binary codes is gaining attention because each quantized bit can be directly utilized for computations without dequantization using look-up tables. Previous attempts, however, only allow for integer numbers of…

Machine Learning · Computer Science 2020-10-23 Dongsoo Lee , Se Jung Kwon , Byeongwook Kim , Yongkweon Jeon , Baeseong Park , Jeongin Yun

Structured Sparse Ternary Weight Coding of Deep Neural Networks for Efficient Hardware Implementations

Deep neural networks (DNNs) usually demand a large amount of operations for real-time inference. Especially, fully-connected layers contain a large number of weights, thus they usually need many off-chip memory accesses for inference. We…

Computer Vision and Pattern Recognition · Computer Science 2017-07-13 Yoonho Boo , Wonyong Sung

Efficient Micro-Structured Weight Unification and Pruning for Neural Network Compression

Compressing Deep Neural Network (DNN) models to alleviate the storage and computation requirements is essential for practical applications, especially for resource limited devices. Although capable of reducing a reasonable amount of model…

Machine Learning · Computer Science 2021-06-17 Sheng Lin , Wei Jiang , Wei Wang , Kaidi Xu , Yanzhi Wang , Shan Liu , Songnan Li

SQS: Bayesian DNN Compression through Sparse Quantized Sub-distributions

Compressing large-scale neural networks is essential for deploying models on resource-constrained devices. Most existing methods adopt weight pruning or low-bit quantization individually, often resulting in suboptimal compression rates to…

Machine Learning · Computer Science 2025-10-13 Ziyi Wang , Nan Jiang , Guang Lin , Qifan Song

Compact representations of convolutional neural networks via weight pruning and quantization

The state-of-the-art performance for several real-world problems is currently reached by convolutional neural networks (CNN). Such learning models exploit recent results in the field of deep learning, typically leading to highly performing,…

Machine Learning · Computer Science 2021-08-31 Giosuè Cataldo Marinò , Alessandro Petrini , Dario Malchiodi , Marco Frasca

Compact and Computationally Efficient Representation of Deep Neural Networks

At the core of any inference procedure in deep neural networks are dot product operations, which are the component that require the highest computational resources. A common approach to reduce the cost of inference is to reduce its memory…

Machine Learning · Computer Science 2018-12-19 Simon Wiedemann , Klaus-Robert Müller , Wojciech Samek

Weightless: Lossy Weight Encoding For Deep Neural Network Compression

The large memory requirements of deep neural networks limit their deployment and adoption on many devices. Model compression methods effectively reduce the memory requirements of these models, usually through applying transformations such…

Machine Learning · Computer Science 2017-11-15 Brandon Reagen , Udit Gupta , Robert Adolf , Michael M. Mitzenmacher , Alexander M. Rush , Gu-Yeon Wei , David Brooks

A Theoretical Understanding of Neural Network Compression from Sparse Linear Approximation

The goal of model compression is to reduce the size of a large neural network while retaining a comparable performance. As a result, computation and memory costs in resource-limited applications may be significantly reduced by dropping…

Machine Learning · Statistics 2022-11-10 Wenjing Yang , Ganghua Wang , Jie Ding , Yuhong Yang

Towards Optimal Compression: Joint Pruning and Quantization

Model compression is instrumental in optimizing deep neural network inference on resource-constrained hardware. The prevailing methods for network compression, namely quantization and pruning, have been shown to enhance efficiency at the…

Machine Learning · Computer Science 2023-06-13 Ben Zandonati , Glenn Bucagu , Adrian Alan Pol , Maurizio Pierini , Olya Sirkin , Tal Kopetz

To prune, or not to prune: exploring the efficacy of pruning for model compression

Model pruning seeks to induce sparsity in a deep neural network's various connection matrices, thereby reducing the number of nonzero-valued parameters in the model. Recent reports (Han et al., 2015; Narang et al., 2017) prune deep networks…

Machine Learning · Statistics 2017-11-15 Michael Zhu , Suyog Gupta

WeightMom: Learning Sparse Networks using Iterative Momentum-based pruning

Deep Neural Networks have been used in a wide variety of applications with significant success. However, their highly complex nature owing to comprising millions of parameters has lead to problems during deployment in pipelines with low…

Machine Learning · Computer Science 2022-08-15 Elvis Johnson , Xiaochen Tang , Sriramacharyulu Samudrala

End-to-End Supermask Pruning: Learning to Prune Image Captioning Models

With the advancement of deep models, research work on image captioning has led to a remarkable gain in raw performance over the last decade, along with increasing model complexity and computational cost. However, surprisingly works on…

Computer Vision and Pattern Recognition · Computer Science 2021-10-08 Jia Huei Tan , Chee Seng Chan , Joon Huang Chuah

Neural Network Compression using Binarization and Few Full-Precision Weights

Quantization and pruning are two effective Deep Neural Networks model compression methods. In this paper, we propose Automatic Prune Binarization (APB), a novel compression technique combining quantization with pruning. APB enhances the…

Computer Vision and Pattern Recognition · Computer Science 2023-09-18 Franco Maria Nardini , Cosimo Rulli , Salvatore Trani , Rossano Venturini

Retraining-Based Iterative Weight Quantization for Deep Neural Networks

Model compression has gained a lot of attention due to its ability to reduce hardware resource requirements significantly while maintaining accuracy of DNNs. Model compression is especially useful for memory-intensive recurrent neural…

Machine Learning · Computer Science 2018-05-30 Dongsoo Lee , Byeongwook Kim

Focused Quantization for Sparse CNNs

Deep convolutional neural networks (CNNs) are powerful tools for a wide range of vision tasks, but the enormous amount of memory and compute resources required by CNNs pose a challenge in deploying them on constrained devices. Existing…

Machine Learning · Computer Science 2019-10-30 Yiren Zhao , Xitong Gao , Daniel Bates , Robert Mullins , Cheng-Zhong Xu