Related papers: Layer-Wise Data-Free CNN Compression

A simple approach for quantizing neural networks

In this short note, we propose a new method for quantizing the weights of a fully trained neural network. A simple deterministic pre-processing step allows us to quantize network layers via memoryless scalar quantization while preserving…

Machine Learning · Computer Science 2023-04-06 Johannes Maly , Rayan Saab

Data-Independent Structured Pruning of Neural Networks via Coresets

Model compression is crucial for deployment of neural networks on devices with limited computational and memory resources. Many different methods show comparable accuracy of the compressed model and similar compression rates. However, the…

Machine Learning · Computer Science 2020-08-21 Ben Mussay , Daniel Feldman , Samson Zhou , Vladimir Braverman , Margarita Osadchy

Data-Free Backbone Fine-Tuning for Pruned Neural Networks

Model compression techniques reduce the computational load and memory consumption of deep neural networks. After the compression operation, e.g. parameter pruning, the model is normally fine-tuned on the original training dataset to recover…

Computer Vision and Pattern Recognition · Computer Science 2023-06-23 Adrian Holzbock , Achyut Hegde , Klaus Dietmayer , Vasileios Belagiannis

Data-Independent Neural Pruning via Coresets

Previous work showed empirically that large neural networks can be significantly reduced in size while preserving their accuracy. Model compression became a central research topic, as it is crucial for deployment of neural networks on…

Machine Learning · Computer Science 2020-01-06 Ben Mussay , Margarita Osadchy , Vladimir Braverman , Samson Zhou , Dan Feldman

Compact representations of convolutional neural networks via weight pruning and quantization

The state-of-the-art performance for several real-world problems is currently reached by convolutional neural networks (CNN). Such learning models exploit recent results in the field of deep learning, typically leading to highly performing,…

Machine Learning · Computer Science 2021-08-31 Giosuè Cataldo Marinò , Alessandro Petrini , Dario Malchiodi , Marco Frasca

The Knowledge Within: Methods for Data-Free Model Compression

Recently, an extensive amount of research has been focused on compressing and accelerating Deep Neural Networks (DNN). So far, high compression rate algorithms require part of the training dataset for a low precision calibration, or a…

Machine Learning · Computer Science 2020-04-08 Matan Haroush , Itay Hubara , Elad Hoffer , Daniel Soudry

C2S2: Cost-aware Channel Sparse Selection for Progressive Network Pruning

This paper describes a channel-selection approach for simplifying deep neural networks. Specifically, we propose a new type of generic network layer, called pruning layer, to seamlessly augment a given pre-trained model for compression.…

Computer Vision and Pattern Recognition · Computer Science 2019-04-09 Chih-Yao Chiu , Hwann-Tzong Chen , Tyng-Luh Liu

Post-training deep neural network pruning via layer-wise calibration

We present a post-training weight pruning method for deep neural networks that achieves accuracy levels tolerable for the production setting and that is sufficiently fast to be run on commodity hardware such as desktop CPUs or edge devices.…

Computer Vision and Pattern Recognition · Computer Science 2021-05-03 Ivan Lazarevich , Alexander Kozlov , Nikita Malinin

Compression strategies and space-conscious representations for deep neural networks

Recent advances in deep learning have made available large, powerful convolutional neural networks (CNN) with state-of-the-art performance in several real-world applications. Unfortunately, these large-sized models have millions of…

Machine Learning · Computer Science 2020-07-17 Giosuè Cataldo Marinò , Gregorio Ghidoli , Marco Frasca , Dario Malchiodi

A Data-Free Analytical Quantization Scheme for Deep Learning Models

Despite the success of CNN models on a variety of Image classification and segmentation tasks, their extensive computational and storage demands pose considerable challenges for real-world deployment on resource-constrained devices.…

Computer Vision and Pattern Recognition · Computer Science 2025-09-10 Ahmed Luqman , Khuzemah Qazi , Murray Patterson , Malik Jahan Khan , Imdadullah Khan

Neural network compression via learnable wavelet transforms

Wavelets are well known for data compression, yet have rarely been applied to the compression of neural networks. This paper shows how the fast wavelet transform can be used to compress linear layers in neural networks. Linear layers still…

Machine Learning · Computer Science 2020-08-21 Moritz Wolter , Shaohui Lin , Angela Yao

Pruning and Quantization for Deep Neural Network Acceleration: A Survey

Deep neural networks have been applied in many applications exhibiting extraordinary abilities in the field of computer vision. However, complex network architectures challenge efficient real-time deployment and require significant…

Computer Vision and Pattern Recognition · Computer Science 2021-06-16 Tailin Liang , John Glossner , Lei Wang , Shaobo Shi , Xiaotong Zhang

Joint Quantization and Pruning Neural Networks Approach: A Case Study on FSO Receivers

Towards fast, hardware-efficient, and low-complexity receivers, we propose a compression-aware learning approach and examine it on free-space optical (FSO) receivers for turbulence mitigation. The learning approach jointly quantize, prune,…

Signal Processing · Electrical Eng. & Systems 2026-01-13 Mohanad Obeed , Ming Jian

Gradient-Free Training of Quantized Neural Networks

Training neural networks requires significant computational resources and energy. Methods like mixed-precision and quantization-aware training reduce bit usage, yet they still depend heavily on computationally expensive gradient-based…

Machine Learning · Computer Science 2025-09-30 Noa Cohen , Omkar Joglekar , Dotan Di Castro , Vladimir Tchuiev , Shir Kozlovsky , Michal Moshkovitz

Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon

How to develop slim and accurate deep neural networks has become crucial for real- world applications, especially for those employed in embedded systems. Though previous work along this research line has shown some promising results, most…

Neural and Evolutionary Computing · Computer Science 2019-10-02 Xin Dong , Shangyu Chen , Sinno Jialin Pan

Towards Compact CNNs via Collaborative Compression

Channel pruning and tensor decomposition have received extensive attention in convolutional neural network compression. However, these two techniques are traditionally deployed in an isolated manner, leading to significant accuracy drop…

Computer Vision and Pattern Recognition · Computer Science 2021-05-25 Yuchao Li , Shaohui Lin , Jianzhuang Liu , Qixiang Ye , Mengdi Wang , Fei Chao , Fan Yang , Jincheng Ma , Qi Tian , Rongrong Ji

Efficient Inference of CNNs via Channel Pruning

The deployment of Convolutional Neural Networks (CNNs) on resource constrained platforms such as mobile devices and embedded systems has been greatly hindered by their high implementation cost, and thus motivated a lot research interest in…

Computer Vision and Pattern Recognition · Computer Science 2019-08-12 Boyu Zhang , Azadeh Davoodi , Yu Hen Hu

Transform Quantization for CNN (Convolutional Neural Network) Compression

In this paper, we compress convolutional neural network (CNN) weights post-training via transform quantization. Previous CNN quantization techniques tend to ignore the joint statistics of weights and activations, producing sub-optimal CNN…

Computer Vision and Pattern Recognition · Computer Science 2021-11-09 Sean I. Young , Wang Zhe , David Taubman , Bernd Girod

Activation Density driven Energy-Efficient Pruning in Training

Neural network pruning with suitable retraining can yield networks with considerably fewer parameters than the original with comparable degrees of accuracy. Typical pruning methods require large, fully trained networks as a starting point…

Machine Learning · Computer Science 2020-10-13 Timothy Foldy-Porto , Yeshwanth Venkatesha , Priyadarshini Panda

Differentiable Fine-grained Quantization for Deep Neural Network Compression

Neural networks have shown great performance in cognitive tasks. When deploying network models on mobile devices with limited resources, weight quantization has been widely adopted. Binary quantization obtains the highest compression but…

Computer Vision and Pattern Recognition · Computer Science 2018-11-14 Hsin-Pai Cheng , Yuanjun Huang , Xuyang Guo , Yifei Huang , Feng Yan , Hai Li , Yiran Chen