Related papers: Gradient-based Weight Density Balancing for Robust…

Sparse Networks from Scratch: Faster Training without Losing Performance

We demonstrate the possibility of what we call sparse learning: accelerated training of deep neural networks that maintain sparse weights throughout training while achieving dense performance levels. We accomplish this by developing sparse…

Machine Learning · Computer Science 2019-08-27 Tim Dettmers , Luke Zettlemoyer

Efficient Neural Network Training via Forward and Backward Propagation Sparsification

Sparse training is a natural idea to accelerate the training speed of deep neural networks and save the memory usage, especially since large modern neural networks are significantly over-parameterized. However, most of the existing methods…

Machine Learning · Computer Science 2021-11-11 Xiao Zhou , Weizhong Zhang , Zonghao Chen , Shizhe Diao , Tong Zhang

Balance is Essence: Accelerating Sparse Training via Adaptive Gradient Correction

Despite impressive performance, deep neural networks require significant memory and computation costs, prohibiting their application in resource-constrained scenarios. Sparse training is one of the most common techniques to reduce these…

Machine Learning · Computer Science 2023-12-06 Bowen Lei , Dongkuan Xu , Ruqi Zhang , Shuren He , Bani K. Mallick

Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers

We present a novel network pruning algorithm called Dynamic Sparse Training that can jointly find the optimal network parameters and sparse network structure in a unified optimization process with trainable pruning thresholds. These…

Machine Learning · Computer Science 2020-05-15 Junjie Liu , Zhe Xu , Runbin Shi , Ray C. C. Cheung , Hayden K. H. So

Learning where to learn: Gradient sparsity in meta and continual learning

Finding neural network weights that generalize well from small datasets is difficult. A promising approach is to learn a weight initialization such that a small number of weight changes results in low generalization error. We show that this…

Machine Learning · Computer Science 2021-10-28 Johannes von Oswald , Dominic Zhao , Seijin Kobayashi , Simon Schug , Massimo Caccia , Nicolas Zucchet , João Sacramento

Selfish Sparse RNN Training

Sparse neural networks have been widely applied to reduce the computational demands of training and deploying over-parameterized deep neural networks. For inference acceleration, methods that discover a sparse network from a pre-trained…

Machine Learning · Computer Science 2021-06-16 Shiwei Liu , Decebal Constantin Mocanu , Yulong Pei , Mykola Pechenizkiy

Learning Sparse Networks Using Targeted Dropout

Neural networks are easier to optimise when they have many more weights than are required for modelling the mapping from inputs to outputs. This suggests a two-stage learning procedure that first learns a large net and then prunes away…

Machine Learning · Computer Science 2019-09-10 Aidan N. Gomez , Ivan Zhang , Siddhartha Rao Kamalakara , Divyam Madaan , Kevin Swersky , Yarin Gal , Geoffrey E. Hinton

SuperNeurons: FFT-based Gradient Sparsification in the Distributed Training of Deep Neural Networks

The performance and efficiency of distributed training of Deep Neural Networks highly depend on the performance of gradient averaging among all participating nodes, which is bounded by the communication between nodes. There are two major…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-02-10 Linnan Wang , Wei Wu , Junyu Zhang , Hang Liu , George Bosilca , Maurice Herlihy , Rodrigo Fonseca

Learned Weight Sharing for Deep Multi-Task Learning by Natural Evolution Strategy and Stochastic Gradient Descent

In deep multi-task learning, weights of task-specific networks are shared between tasks to improve performance on each single one. Since the question, which weights to share between layers, is difficult to answer, human-designed…

Machine Learning · Computer Science 2020-03-24 Jonas Prellberg , Oliver Kramer

Sparse Random Networks for Communication-Efficient Federated Learning

One main challenge in federated learning is the large communication cost of exchanging weight updates from clients to the server at each round. While prior work has made great progress in compressing the weight updates through gradient…

Machine Learning · Computer Science 2023-02-10 Berivan Isik , Francesco Pase , Deniz Gunduz , Tsachy Weissman , Michele Zorzi

The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training

Random pruning is arguably the most naive way to attain sparsity in neural networks, but has been deemed uncompetitive by either post-training pruning or sparse training. In this paper, we focus on sparse training and highlight a perhaps…

Machine Learning · Computer Science 2022-02-08 Shiwei Liu , Tianlong Chen , Xiaohan Chen , Li Shen , Decebal Constantin Mocanu , Zhangyang Wang , Mykola Pechenizkiy

Random Sparse Lifts: Construction, Analysis and Convergence of finite sparse networks

We present a framework to define a large class of neural networks for which, by construction, training by gradient flow provably reaches arbitrarily low loss when the number of parameters grows. Distinct from the fixed-space global…

Optimization and Control · Mathematics 2025-01-13 David A. R. Robin , Kevin Scaman , Marc Lelarge

Training Sparse Neural Networks using Compressed Sensing

Pruning the weights of neural networks is an effective and widely-used technique for reducing model size and inference complexity. We develop and test a novel method based on compressed sensing which combines the pruning and training into a…

Computer Vision and Pattern Recognition · Computer Science 2021-04-08 Jonathan W. Siegel , Jianhong Chen , Pengchuan Zhang , Jinchao Xu

Sparsifying Neural Network Connections for Face Recognition

This paper proposes to learn high-performance deep ConvNets with sparse neural connections, referred to as sparse ConvNets, for face recognition. The sparse ConvNets are learned in an iterative way, each time one additional layer is…

Computer Vision and Pattern Recognition · Computer Science 2015-12-08 Yi Sun , Xiaogang Wang , Xiaoou Tang

Finding trainable sparse networks through Neural Tangent Transfer

Deep neural networks have dramatically transformed machine learning, but their memory and energy demands are substantial. The requirements of real biological neural networks are rather modest in comparison, and one feature that might…

Machine Learning · Computer Science 2020-07-27 Tianlin Liu , Friedemann Zenke

Are Straight-Through gradients and Soft-Thresholding all you need for Sparse Training?

Turning the weights to zero when training a neural network helps in reducing the computational complexity at inference. To progressively increase the sparsity ratio in the network without causing sharp weight discontinuities during…

Computer Vision and Pattern Recognition · Computer Science 2023-01-25 Antoine Vanderschueren , Christophe De Vleeschouwer

Accurate Neural Network Pruning Requires Rethinking Sparse Optimization

Obtaining versions of deep neural networks that are both highly-accurate and highly-sparse is one of the main challenges in the area of model compression, and several high-performance pruning techniques have been investigated by the…

Machine Learning · Computer Science 2023-09-11 Denis Kuznedelev , Eldar Kurtic , Eugenia Iofinova , Elias Frantar , Alexandra Peste , Dan Alistarh

NestedNet: Learning Nested Sparse Structures in Deep Neural Networks

Recently, there have been increasing demands to construct compact deep architectures to remove unnecessary redundancy and to improve the inference speed. While many recent works focus on reducing the redundancy by eliminating unneeded…

Computer Vision and Pattern Recognition · Computer Science 2018-03-28 Eunwoo Kim , Chanho Ahn , Songhwai Oh

Efficient Distributed Learning with Sparsity

We propose a novel, efficient approach for distributed sparse learning in high-dimensions, where observations are randomly partitioned across machines. Computationally, at each round our method only requires the master machine to solve a…

Machine Learning · Statistics 2016-05-26 Jialei Wang , Mladen Kolar , Nathan Srebro , Tong Zhang

Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning

Effectively scaling up deep reinforcement learning models has proven notoriously difficult due to network pathologies during training, motivating various targeted interventions such as periodic reset and architectural advances such as layer…

Machine Learning · Computer Science 2025-06-23 Guozheng Ma , Lu Li , Zilin Wang , Li Shen , Pierre-Luc Bacon , Dacheng Tao