English

Coordinating Distributed Example Orders for Provably Accelerated Training

Machine Learning 2023-12-25 v5 Distributed, Parallel, and Cluster Computing Optimization and Control

Abstract

Recent research on online Gradient Balancing (GraB) has revealed that there exist permutation-based example orderings for SGD that are guaranteed to outperform random reshuffling (RR). Whereas RR arbitrarily permutes training examples, GraB leverages stale gradients from prior epochs to order examples -- achieving a provably faster convergence rate than RR. However, GraB is limited by design: while it demonstrates an impressive ability to scale-up training on centralized data, it does not naturally extend to modern distributed ML workloads. We therefore propose Coordinated Distributed GraB (CD-GraB), which uses insights from prior work on kernel thinning to translate the benefits of provably faster permutation-based example ordering to distributed settings. With negligible overhead, CD-GraB exhibits a linear speedup in convergence rate over centralized GraB and outperforms distributed RR on a variety of benchmark tasks.

Keywords

Cite

@article{arxiv.2302.00845,
  title  = {Coordinating Distributed Example Orders for Provably Accelerated Training},
  author = {A. Feder Cooper and Wentao Guo and Khiem Pham and Tiancheng Yuan and Charlie F. Ruan and Yucheng Lu and Christopher De Sa},
  journal= {arXiv preprint arXiv:2302.00845},
  year   = {2023}
}

Comments

NeurIPS 2023

R2 v1 2026-06-28T08:29:49.957Z