Related papers: Smallify: Learning Network Size while Training
To improve federated training of neural networks, we develop FedSparsify, a sparsification strategy based on progressive weight magnitude pruning. Our method has several benefits. First, since the size of the network becomes increasingly…
Efficient machine learning implementations optimized for inference in hardware have wide-ranging benefits, depending on the application, from lower inference latency to higher data throughput and reduced energy consumption. Two popular…
In real applications of Reinforcement Learning (RL), such as robotics, low latency and energy efficient inference is very desired. The use of sparsity and pruning for optimizing Neural Network inference, and particularly to improve energy…
Federated training of large deep neural networks can often be restrictive due to the increasing costs of communicating the updates with increasing model sizes. Various model pruning techniques have been designed in centralized settings to…
The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components. Similarly to their biological counterparts, sparse networks generalize just as…
In this work we present a method to improve the pruning step of the current state-of-the-art methodology to compress neural networks. The novelty of the proposed pruning technique is in its differentiability, which allows pruning to be…
Recurrent Neural Networks (RNN) are widely used to solve a variety of problems and as the quantity of data and the amount of available compute have increased, so have model sizes. The number of parameters in recent state-of-the-art networks…
Pruning large neural networks while maintaining their performance is often desirable due to the reduced space and time complexity. In existing methods, pruning is done within an iterative optimization procedure with either heuristically…
Deep Neural Networks are highly over-parameterized and the size of the neural networks can be reduced significantly after training without any decrease in performance. One can clearly see this phenomenon in a wide range of architectures…
This paper presents a new method for pre-training neural networks that can decrease the total training time for a neural network while maintaining the final performance, which motivates its use on deep neural networks. By partitioning the…
We develop a fast end-to-end method for training lightweight neural networks using multiple classifier heads. By allowing the model to determine the importance of each head and rewarding the choice of a single shallow classifier, we are…
Existing generalization measures that aim to capture a model's simplicity based on parameter counts or norms fail to explain generalization in overparameterized deep neural networks. In this paper, we introduce a new, theoretically…
Pruning the weights of neural networks is an effective and widely-used technique for reducing model size and inference complexity. We develop and test a novel method based on compressed sensing which combines the pruning and training into a…
Quantization and pruning are core techniques used to reduce the inference costs of deep neural networks. State-of-the-art quantization techniques are currently applied to both the weights and activations; however, pruning is most often…
Overparameterization and overfitting are common concerns when designing and training deep neural networks, that are often counteracted by pruning and regularization strategies. However, these strategies remain secondary to most learning…
Recent advancements have scaled neural networks to unprecedented sizes, achieving remarkable performance across a wide range of tasks. However, deploying these large-scale models on resource-constrained devices poses significant challenges…
Network pruning reduces the size of neural networks by removing (pruning) neurons such that the performance drop is minimal. Traditional pruning approaches focus on designing metrics to quantify the usefulness of a neuron which is often…
Pruning methods have shown to be effective at reducing the size of deep neural networks while keeping accuracy almost intact. Among the most effective methods are those that prune a network while training it with a sparsity prior loss and…
Neural network pruning with suitable retraining can yield networks with considerably fewer parameters than the original with comparable degrees of accuracy. Typical pruning methods require large, fully trained networks as a starting point…
We introduce and analyze a new technique for model reduction for deep neural networks. While large networks are theoretically capable of learning arbitrarily complex models, overfitting and model redundancy negatively affects the prediction…