English

Problem-dependent convergence bounds for randomized linear gradient compression

Optimization and Control 2025-07-17 v3 Machine Learning

Abstract

In distributed optimization, the communication of model updates can be a performance bottleneck. Consequently, gradient compression has been proposed as a means of increasing optimization throughput. In general, due to information loss, compression introduces a penalty on the number of iterations needed to reach a solution. In this work, we investigate how the iteration penalty depends on the interaction between compression and problem structure, in the context of non-convex stochastic optimization. We focus on linear schemes, where compression and decompression can be modeled as multiplication with a random matrix. We consider several distributions of matrices, among them Haar-distributed orthogonal matrices and matrices with random Gaussian entries. We find that the impact of compression on convergence can be quantified in terms of a smoothness matrix associated with the objective function, using a norm defined by the compression scheme. The analysis reveals that in certain cases, compression performance is related to low-rank structure or other spectral properties of the problem and our bounds predict that the penalty introduced by compression is significantly reduced compared to worst-case bounds that only consider the compression level, ignoring problem data. We verify the theoretical findings experimentally, including fine-tuning an image classification model.

Keywords

Cite

@article{arxiv.2411.12898,
  title  = {Problem-dependent convergence bounds for randomized linear gradient compression},
  author = {Thomas Flynn and Patrick Johnstone and Shinjae Yoo},
  journal= {arXiv preprint arXiv:2411.12898},
  year   = {2025}
}
R2 v1 2026-06-28T20:05:38.998Z