Related papers: Dirichlet Pruning for Neural Network Compression
In this work we present a method to improve the pruning step of the current state-of-the-art methodology to compress neural networks. The novelty of the proposed pruning technique is in its differentiability, which allows pruning to be…
Pruning is a standard technique for removing unnecessary structure from a neural network to reduce its storage footprint, computational demands, or energy consumption. Pruning can reduce the parameter-counts of many state-of-the-art neural…
Pruning aims to reduce the number of parameters while maintaining performance close to the original network. This work proposes a novel \emph{self-distillation} based pruning strategy, whereby the representational similarity between the…
Structured pruning is a popular method for compressing a neural network: given a large trained network, one alternates between removing channel connections and fine-tuning; reducing the overall width of the network. However, the efficacy of…
Deep Learning models have become the dominant approach in several areas due to their high performance. Unfortunately, the size and hence computational requirements of operating such models can be considerably high. Therefore, this…
Network pruning reduces the size of neural networks by removing (pruning) neurons such that the performance drop is minimal. Traditional pruning approaches focus on designing metrics to quantify the usefulness of a neuron which is often…
We present a filter pruning approach for deep model compression, using a multitask network. Our approach is based on learning a a pruner network to prune a pre-trained target network. The pruner is essentially a multitask deep neural…
Channel pruning is a promising technique to compress the parameters of deep convolutional neural networks(DCNN) and to speed up the inference. This paper aims to address the long-standing inefficiency of channel pruning. Most channel…
Channel pruning, which seeks to reduce the model size by removing redundant channels, is a popular solution for deep networks compression. Existing channel pruning methods usually conduct layer-wise channel selection by directly minimizing…
Network compression is crucial to making the deep networks to be more efficient, faster, and generalizable to low-end hardware. Current network compression methods have two open problems: first, there lacks a theoretical framework to…
Pruning is a compression method which aims to improve the efficiency of neural networks by reducing their number of parameters while maintaining a good performance, thus enhancing the performance-to-cost ratio in nontrivial ways. Of…
In recent years, deep neural networks have achieved great success in the field of computer vision. However, it is still a big challenge to deploy these deep models on resource-constrained embedded devices such as mobile robots, smart phones…
How to develop slim and accurate deep neural networks has become crucial for real- world applications, especially for those employed in embedded systems. Though previous work along this research line has shown some promising results, most…
We introduce a pruning algorithm that provably sparsifies the parameters of a trained model in a way that approximately preserves the model's predictive accuracy. Our algorithm uses a small batch of input points to construct a data-informed…
Model pruning has become a useful technique that improves the computational efficiency of deep learning, making it possible to deploy solutions in resource-limited scenarios. A widely-used practice in relevant work assumes that a…
Network pruning is one of the most dominant methods for reducing the heavy inference cost of deep neural networks. Existing methods often iteratively prune networks to attain high compression ratio without incurring significant loss in…
Model pruning seeks to induce sparsity in a deep neural network's various connection matrices, thereby reducing the number of nonzero-valued parameters in the model. Recent reports (Han et al., 2015; Narang et al., 2017) prune deep networks…
Compression techniques for deep neural networks are important for implementing them on small embedded devices. In particular, channel-pruning is a useful technique for realizing compact networks. However, many conventional methods require…
This paper describes a channel-selection approach for simplifying deep neural networks. Specifically, we propose a new type of generic network layer, called pruning layer, to seamlessly augment a given pre-trained model for compression.…
Structured pruning compresses neural networks by reducing channels (filters) for fast inference and low footprint at run-time. To restore accuracy after pruning, fine-tuning is usually applied to pruned networks. However, too few remaining…