Partial Parameter Updates for Efficient Distributed Training

Anastasiia Filippova; Angelos Katharopoulos; David Grangier; Ronan Collobert

Partial Parameter Updates for Efficient Distributed Training

Machine Learning 2025-09-29 v1 Artificial Intelligence

Authors: Anastasiia Filippova , Angelos Katharopoulos , David Grangier , Ronan Collobert

Abstract

We introduce a memory- and compute-efficient method for low-communication distributed training. Existing methods reduce communication by performing multiple local updates between infrequent global synchronizations. We demonstrate that their efficiency can be significantly improved by restricting backpropagation: instead of updating all the parameters, each node updates only a fixed subset while keeping the remainder frozen during local steps. This constraint substantially reduces peak memory usage and training FLOPs, while a full forward pass over all parameters eliminates the need for cross-node activation exchange. Experiments on a $1.3$ B-parameter language model trained across $32$ nodes show that our method matches the perplexity of prior low-communication approaches under identical token and bandwidth budgets while reducing training FLOPs and peak memory.

Keywords

distributed optimization distributed training large language model training

Cite

@article{arxiv.2509.22418,
  title  = {Partial Parameter Updates for Efficient Distributed Training},
  author = {Anastasiia Filippova and Angelos Katharopoulos and David Grangier and Ronan Collobert},
  journal= {arXiv preprint arXiv:2509.22418},
  year   = {2025}
}

Partial Parameter Updates for Efficient Distributed Training

Abstract

Keywords

Cite

Related papers