English

Communication Compression for Distributed Learning without Control Variates

Machine Learning 2025-09-12 v2 Signal Processing Optimization and Control

Abstract

Distributed learning algorithms, such as the ones employed in Federated Learning (FL), require communication compression to reduce the cost of client uploads. The compression methods used in practice are often biased, making error feedback necessary both to achieve convergence under aggressive compression and to provide theoretical convergence guarantees. However, error feedback requires client-specific control variates, creating two key challenges: it violates privacy-preserving principles and demands stateful clients. In this paper, we propose Compressed Aggregate Feedback (CAFe), a novel distributed learning framework that allows highly compressible client updates by exploiting past aggregated updates, and does not require control variates. We consider Distributed Gradient Descent (DGD) as a representative algorithm and analytically prove CAFe's superiority to Distributed Compressed Gradient Descent (DCGD) with biased compression in the non-convex regime with bounded gradient dissimilarity. Experimental results confirm that CAFe outperforms existing distributed learning compression schemes.

Keywords

Cite

@article{arxiv.2412.04538,
  title  = {Communication Compression for Distributed Learning without Control Variates},
  author = {Tomas Ortega and Chun-Yin Huang and Xiaoxiao Li and Hamid Jafarkhani},
  journal= {arXiv preprint arXiv:2412.04538},
  year   = {2025}
}

Comments

Revised format and minor exposition edits, results unchanged