Related papers: Differentiable Network Pruning for Microcontroller…
We present a filter pruning approach for deep model compression, using a multitask network. Our approach is based on learning a a pruner network to prune a pre-trained target network. The pruner is essentially a multitask deep neural…
Real time application of deep learning algorithms is often hindered by high computational complexity and frequent memory accesses. Network pruning is a promising technique to solve this problem. However, pruning usually results in irregular…
Pruning is one of the most effective model reduction techniques. Deep networks require massive computation and such models need to be compressed to bring them on edge devices. Most existing pruning techniques are focused on vision-based…
Existing high-performance deep learning models require very intensive computing. For this reason, it is difficult to embed a deep learning model into a system with limited resources. In this paper, we propose the novel idea of the network…
The rapid growth of resource-constrained mobile platforms, including mobile robots, wearable systems, and Internet-of-Things devices, has increased the demand for computationally efficient neural network controllers (NNCs) that can operate…
Structured pruning is an effective approach for compressing large pre-trained neural networks without significantly affecting their performance. However, most current structured pruning methods do not provide any performance guarantees, and…
Deep neural networks (DNNs) offer significant flexibility and robust performance. This makes them ideal for building not only system models but also advanced neural network controllers (NNCs). However, their high complexity and…
Convolutional neural networks are prevailing in deep learning tasks. However, they suffer from massive cost issues when working on mobile devices. Network pruning is an effective method of model compression to handle such problems. This…
Pruning is one of the major methods to compress deep neural networks. In this paper, we propose an Ising energy model within an optimization framework for pruning convolutional kernels and hidden units. This model is designed to reduce…
Deep neural networks have been applied in many applications exhibiting extraordinary abilities in the field of computer vision. However, complex network architectures challenge efficient real-time deployment and require significant…
Large Deep Neural Networks (DNNs) are the backbone of today's artificial intelligence due to their ability to make accurate predictions when being trained on huge datasets. With advancing technologies, such as the Internet of Things,…
The success of convolutional neural networks (CNNs) in various applications is accompanied by a significant increase in computation and parameter storage costs. Recent efforts to reduce these overheads involve pruning and compressing the…
Artificial neural network pruning is a method in which artificial neural network sizes can be reduced while attempting to preserve the predicting capabilities of the network. This is done to make the model smaller or faster during inference…
Recent advances in Artificial Intelligence (AI) on the Internet of Things (IoT)-enabled network edge has realized edge intelligence in several applications such as smart agriculture, smart hospitals, and smart factories by enabling…
In recent years, deep neural networks have known a wide success in various application domains. However, they require important computational and memory resources, which severely hinders their deployment, notably on mobile devices or for…
DTMM is a library designed for efficient deployment and execution of machine learning models on weak IoT devices such as microcontroller units (MCUs). The motivation for designing DTMM comes from the emerging field of tiny machine learning…
Model compression is crucial for deployment of neural networks on devices with limited computational and memory resources. Many different methods show comparable accuracy of the compressed model and similar compression rates. However, the…
Channel pruning is one of the predominant approaches for deep model compression. Existing pruning methods either train from scratch with sparsity constraints on channels, or minimize the reconstruction error between the pre-trained feature…
In this work we present a method to improve the pruning step of the current state-of-the-art methodology to compress neural networks. The novelty of the proposed pruning technique is in its differentiability, which allows pruning to be…
Structured pruning is a popular method for compressing a neural network: given a large trained network, one alternates between removing channel connections and fine-tuning; reducing the overall width of the network. However, the efficacy of…