Related papers: Effective Model Compression via Stage-wise Pruning
Modern deep neural network models are large and computationally intensive. One typical solution to this issue is model pruning. However, most current pruning algorithms depend on hand crafted rules or domain expertise. To overcome this…
In this paper, we propose a novel meta learning approach for automatic channel pruning of very deep neural networks. We first train a PruningNet, a kind of meta network, which is able to generate weight parameters for any pruned structure…
Self-supervised speech representation learning (SSL) has shown to be effective in various downstream tasks, but SSL models are usually large and slow. Model compression techniques such as pruning aim to reduce the model size and computation…
Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices. However, prior pruning schemes have limited application scenarios due to…
In this work, we propose a simple but effective channel pruning framework called Progressive Channel Pruning (PCP) to accelerate Convolutional Neural Networks (CNNs). In contrast to the existing channel pruning methods that prune channels…
In recent years, deep neural networks have achieved great success in the field of computer vision. However, it is still a big challenge to deploy these deep models on resource-constrained embedded devices such as mobile robots, smart phones…
Structured pruning is an effective approach for compressing large pre-trained neural networks without significantly affecting their performance. However, most current structured pruning methods do not provide any performance guarantees, and…
Deep learning's success has been attributed to the training of large, overparameterized models on massive amounts of data. As this trend continues, model training has become prohibitively costly, requiring access to powerful computing…
Channel pruning is a promising technique to compress the parameters of deep convolutional neural networks(DCNN) and to speed up the inference. This paper aims to address the long-standing inefficiency of channel pruning. Most channel…
Network pruning is a widely used technique to reduce computation cost and model size for deep neural networks. However, the typical three-stage pipeline significantly increases the overall training time. In this paper, we develop a…
Filter pruning is a common method to achieve model compression and acceleration in deep neural networks (DNNs).Some research regarded filter pruning as a combinatorial optimization problem and thus used evolutionary algorithms (EA) to prune…
Besides accuracy, the model size of convolutional neural networks (CNN) models is another important factor considering limited hardware resources in practical applications. For example, employing deep neural networks on mobile systems…
Channel pruning is one of the predominant approaches for deep model compression. Existing pruning methods either train from scratch with sparsity constraints on channels, or minimize the reconstruction error between the pre-trained feature…
In recent years, deep neural networks have known a wide success in various application domains. However, they require important computational and memory resources, which severely hinders their deployment, notably on mobile devices or for…
Convolutional Neural Networks (CNNs) have a large number of parameters and take significantly large hardware resources to compute, so edge devices struggle to run high-level networks. This paper proposes a novel method to reduce the…
Pruning has become a very powerful and effective technique to compress and accelerate modern neural networks. Existing pruning methods can be grouped into two categories: filter pruning (FP) and weight pruning (WP). FP wins at hardware…
Pruning is a neural network optimization technique that sacrifices accuracy in exchange for lower computational requirements. Pruning has been useful when working with extremely constrained environments in tinyML. Unfortunately, special…
Deep Convolutional Neural Networks (CNN) has achieved significant success in computer vision field. However, the high computational cost of the deep complex models prevents the deployment on edge devices with limited memory and…
Existing high-performance deep learning models require very intensive computing. For this reason, it is difficult to embed a deep learning model into a system with limited resources. In this paper, we propose the novel idea of the network…
In the traditional deep compression framework, iteratively performing network pruning and quantization can reduce the model size and computation cost to meet the deployment requirements. However, such a step-wise application of pruning and…