English
Related papers

Related papers: Gradient-based Weight Density Balancing for Robust…

200 papers

We demonstrate the possibility of what we call sparse learning: accelerated training of deep neural networks that maintain sparse weights throughout training while achieving dense performance levels. We accomplish this by developing sparse…

Machine Learning · Computer Science 2019-08-27 Tim Dettmers , Luke Zettlemoyer

Sparse training is a natural idea to accelerate the training speed of deep neural networks and save the memory usage, especially since large modern neural networks are significantly over-parameterized. However, most of the existing methods…

Machine Learning · Computer Science 2021-11-11 Xiao Zhou , Weizhong Zhang , Zonghao Chen , Shizhe Diao , Tong Zhang

Despite impressive performance, deep neural networks require significant memory and computation costs, prohibiting their application in resource-constrained scenarios. Sparse training is one of the most common techniques to reduce these…

Machine Learning · Computer Science 2023-12-06 Bowen Lei , Dongkuan Xu , Ruqi Zhang , Shuren He , Bani K. Mallick

We present a novel network pruning algorithm called Dynamic Sparse Training that can jointly find the optimal network parameters and sparse network structure in a unified optimization process with trainable pruning thresholds. These…

Machine Learning · Computer Science 2020-05-15 Junjie Liu , Zhe Xu , Runbin Shi , Ray C. C. Cheung , Hayden K. H. So

Finding neural network weights that generalize well from small datasets is difficult. A promising approach is to learn a weight initialization such that a small number of weight changes results in low generalization error. We show that this…

Sparse neural networks have been widely applied to reduce the computational demands of training and deploying over-parameterized deep neural networks. For inference acceleration, methods that discover a sparse network from a pre-trained…

Machine Learning · Computer Science 2021-06-16 Shiwei Liu , Decebal Constantin Mocanu , Yulong Pei , Mykola Pechenizkiy

Neural networks are easier to optimise when they have many more weights than are required for modelling the mapping from inputs to outputs. This suggests a two-stage learning procedure that first learns a large net and then prunes away…

Machine Learning · Computer Science 2019-09-10 Aidan N. Gomez , Ivan Zhang , Siddhartha Rao Kamalakara , Divyam Madaan , Kevin Swersky , Yarin Gal , Geoffrey E. Hinton

The performance and efficiency of distributed training of Deep Neural Networks highly depend on the performance of gradient averaging among all participating nodes, which is bounded by the communication between nodes. There are two major…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-02-10 Linnan Wang , Wei Wu , Junyu Zhang , Hang Liu , George Bosilca , Maurice Herlihy , Rodrigo Fonseca

In deep multi-task learning, weights of task-specific networks are shared between tasks to improve performance on each single one. Since the question, which weights to share between layers, is difficult to answer, human-designed…

Machine Learning · Computer Science 2020-03-24 Jonas Prellberg , Oliver Kramer

One main challenge in federated learning is the large communication cost of exchanging weight updates from clients to the server at each round. While prior work has made great progress in compressing the weight updates through gradient…

Machine Learning · Computer Science 2023-02-10 Berivan Isik , Francesco Pase , Deniz Gunduz , Tsachy Weissman , Michele Zorzi

Random pruning is arguably the most naive way to attain sparsity in neural networks, but has been deemed uncompetitive by either post-training pruning or sparse training. In this paper, we focus on sparse training and highlight a perhaps…

Machine Learning · Computer Science 2022-02-08 Shiwei Liu , Tianlong Chen , Xiaohan Chen , Li Shen , Decebal Constantin Mocanu , Zhangyang Wang , Mykola Pechenizkiy

We present a framework to define a large class of neural networks for which, by construction, training by gradient flow provably reaches arbitrarily low loss when the number of parameters grows. Distinct from the fixed-space global…

Optimization and Control · Mathematics 2025-01-13 David A. R. Robin , Kevin Scaman , Marc Lelarge

Pruning the weights of neural networks is an effective and widely-used technique for reducing model size and inference complexity. We develop and test a novel method based on compressed sensing which combines the pruning and training into a…

Computer Vision and Pattern Recognition · Computer Science 2021-04-08 Jonathan W. Siegel , Jianhong Chen , Pengchuan Zhang , Jinchao Xu

This paper proposes to learn high-performance deep ConvNets with sparse neural connections, referred to as sparse ConvNets, for face recognition. The sparse ConvNets are learned in an iterative way, each time one additional layer is…

Computer Vision and Pattern Recognition · Computer Science 2015-12-08 Yi Sun , Xiaogang Wang , Xiaoou Tang

Deep neural networks have dramatically transformed machine learning, but their memory and energy demands are substantial. The requirements of real biological neural networks are rather modest in comparison, and one feature that might…

Machine Learning · Computer Science 2020-07-27 Tianlin Liu , Friedemann Zenke

Turning the weights to zero when training a neural network helps in reducing the computational complexity at inference. To progressively increase the sparsity ratio in the network without causing sharp weight discontinuities during…

Computer Vision and Pattern Recognition · Computer Science 2023-01-25 Antoine Vanderschueren , Christophe De Vleeschouwer

Obtaining versions of deep neural networks that are both highly-accurate and highly-sparse is one of the main challenges in the area of model compression, and several high-performance pruning techniques have been investigated by the…

Machine Learning · Computer Science 2023-09-11 Denis Kuznedelev , Eldar Kurtic , Eugenia Iofinova , Elias Frantar , Alexandra Peste , Dan Alistarh

Recently, there have been increasing demands to construct compact deep architectures to remove unnecessary redundancy and to improve the inference speed. While many recent works focus on reducing the redundancy by eliminating unneeded…

Computer Vision and Pattern Recognition · Computer Science 2018-03-28 Eunwoo Kim , Chanho Ahn , Songhwai Oh

We propose a novel, efficient approach for distributed sparse learning in high-dimensions, where observations are randomly partitioned across machines. Computationally, at each round our method only requires the master machine to solve a…

Machine Learning · Statistics 2016-05-26 Jialei Wang , Mladen Kolar , Nathan Srebro , Tong Zhang

Effectively scaling up deep reinforcement learning models has proven notoriously difficult due to network pathologies during training, motivating various targeted interventions such as periodic reset and architectural advances such as layer…

Machine Learning · Computer Science 2025-06-23 Guozheng Ma , Lu Li , Zilin Wang , Li Shen , Pierre-Luc Bacon , Dacheng Tao
‹ Prev 1 2 3 10 Next ›