Novel Gradient Sparsification Algorithm via Bayesian Inference

Ali Bereyhi; Ben Liang; Gary Boudreau; Ali Afana

Novel Gradient Sparsification Algorithm via Bayesian Inference

Machine Learning 2024-09-24 v1 Information Theory Signal Processing math.IT

Authors: Ali Bereyhi , Ben Liang , Gary Boudreau , Ali Afana

Abstract

Error accumulation is an essential component of the Top- $k$ sparsification method in distributed gradient descent. It implicitly scales the learning rate and prevents the slow-down of lateral movement, but it can also deteriorate convergence. This paper proposes a novel sparsification algorithm called regularized Top- $k$ (RegTop- $k$ ) that controls the learning rate scaling of error accumulation. The algorithm is developed by looking at the gradient sparsification as an inference problem and determining a Bayesian optimal sparsification mask via maximum-a-posteriori estimation. It utilizes past aggregated gradients to evaluate posterior statistics, based on which it prioritizes the local gradient entries. Numerical experiments with ResNet-18 on CIFAR-10 show that at $0.1\%$ sparsification, RegTop- $k$ achieves about $8\%$ higher accuracy than standard Top- $k$ .

Keywords

stochastic gradient descent sparse optimization randomized algorithm

Cite

@article{arxiv.2409.14893,
  title  = {Novel Gradient Sparsification Algorithm via Bayesian Inference},
  author = {Ali Bereyhi and Ben Liang and Gary Boudreau and Ali Afana},
  journal= {arXiv preprint arXiv:2409.14893},
  year   = {2024}
}

Comments

To appear in Proc. IEEE International Workshop on Machine Learning for Signal Processing (MLSP) 2024

Novel Gradient Sparsification Algorithm via Bayesian Inference

Abstract

Keywords

Cite

Comments

Related papers