English

Meta-learning Optimizers for Communication-Efficient Learning

Machine Learning 2025-06-13 v2

Abstract

Communication-efficient variants of SGD, specifically local SGD, have received a great deal of interest in recent years. These approaches compute multiple gradient steps locally on each worker, before averaging model parameters, helping relieve the critical communication bottleneck in distributed deep learning training. Although many variants of these approaches have been proposed, they can sometimes lag behind state-of-the-art adaptive optimizers for deep learning. In this work, we investigate if the recent progress in the emerging area of learned optimizers can potentially close this gap in homogeneous data and homogeneous device settings while remaining communication-efficient. Specifically, we meta-learn how to perform global updates given an update from local SGD iterations. Our results demonstrate that learned optimizers can substantially outperform local SGD and its sophisticated variants while maintaining their communication efficiency. Our learned optimizers can even generalize to unseen and much larger datasets and architectures, including ImageNet and ViTs, and to unseen modalities such as language modeling. We therefore show the potential of learned optimizers for improving communication-efficient distributed learning.

Keywords

Cite

@article{arxiv.2312.02204,
  title  = {Meta-learning Optimizers for Communication-Efficient Learning},
  author = {Charles-Étienne Joseph and Benjamin Thérien and Abhinav Moudgil and Boris Knyazev and Eugene Belilovsky},
  journal= {arXiv preprint arXiv:2312.02204},
  year   = {2025}
}