Related papers: Toward Communication Efficient Adaptive Gradient M…

Compressed Communication for Distributed Training: Adaptive Methods and System

Communication overhead severely hinders the scalability of distributed machine learning systems. Recently, there has been a growing interest in using gradient compression to reduce the communication overhead of the distributed training.…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-19 Yuchen Zhong , Cong Xie , Shuai Zheng , Haibin Lin

Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization

Due to the explosion in the size of the training datasets, distributed learning has received growing interest in recent years. One of the major bottlenecks is the large communication cost between the central server and the local workers.…

Machine Learning · Computer Science 2022-02-25 Yujia Wang , Lu Lin , Jinghui Chen

Adaptive Top-K in SGD for Communication-Efficient Distributed Learning

Distributed stochastic gradient descent (SGD) with gradient compression has become a popular communication-efficient solution for accelerating distributed learning. One commonly used method for gradient compression is Top-K sparsification,…

Machine Learning · Computer Science 2023-09-12 Mengzhe Ruan , Guangfeng Yan , Yuanzhang Xiao , Linqi Song , Weitao Xu

Variance-based Gradient Compression for Efficient Distributed Deep Learning

Due to the substantial computational cost, training state-of-the-art deep neural networks for large-scale datasets often requires distributed training using multiple computation workers. However, by nature, workers need to frequently…

Machine Learning · Computer Science 2018-02-21 Yusuke Tsuzuku , Hiroto Imachi , Takuya Akiba

Enhancing Communication Efficiency in FL with Adaptive Gradient Quantization and Communication Frequency Optimization

Federated Learning (FL) enables participant devices to collaboratively train deep learning models without sharing their data with the server or other devices, effectively addressing data privacy and computational concerns. However, FL faces…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-30 Asadullah Tariq , Tariq Qayyum , Mohamed Adel Serhani , Farag Sallabi , Ikbal Taleb , Ezedin S. Barka

On the Convergence of Decentralized Adaptive Gradient Methods

Adaptive gradient methods including Adam, AdaGrad, and their variants have been very successful for training deep learning models, such as neural networks. Meanwhile, given the need for distributed computing, distributed optimization…

Machine Learning · Computer Science 2021-09-08 Xiangyi Chen , Belhal Karimi , Weijie Zhao , Ping Li

Communication-Efficient Distributed Learning via Sparse and Adaptive Stochastic Gradient

Gradient-based optimization methods implemented on distributed computing architectures are increasingly used to tackle large-scale machine learning applications. A key bottleneck in such distributed systems is the high communication…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-11 Xiaoge Deng , Dongsheng Li , Tao Sun , Xicheng Lu

Communication-Efficient Adaptive Federated Learning

Federated learning is a machine learning training paradigm that enables clients to jointly train models without sharing their own localized data. However, the implementation of federated learning in practice still faces numerous challenges,…

Machine Learning · Computer Science 2023-04-21 Yujia Wang , Lu Lin , Jinghui Chen

Communication-Efficient Federated Learning with Accelerated Client Gradient

Federated learning often suffers from slow and unstable convergence due to the heterogeneous characteristics of participating client datasets. Such a tendency is aggravated when the client participation ratio is low since the information…

Machine Learning · Computer Science 2024-04-02 Geeho Kim , Jinkyu Kim , Bohyung Han

Adaptive Federated Dropout: Improving Communication Efficiency and Generalization for Federated Learning

With more regulations tackling users' privacy-sensitive data protection in recent years, access to such data has become increasingly restricted and controversial. To exploit the wealth of data generated and located at distributed entities…

Machine Learning · Computer Science 2020-11-10 Nader Bouacida , Jiahui Hou , Hui Zang , Xin Liu

Meta-learning Optimizers for Communication-Efficient Learning

Communication-efficient variants of SGD, specifically local SGD, have received a great deal of interest in recent years. These approaches compute multiple gradient steps locally on each worker, before averaging model parameters, helping…

Machine Learning · Computer Science 2025-06-13 Charles-Étienne Joseph , Benjamin Thérien , Abhinav Moudgil , Boris Knyazev , Eugene Belilovsky

Adaptive Consensus Gradients Aggregation for Scaled Distributed Training

Distributed machine learning has recently become a critical paradigm for training large models on vast datasets. We examine the stochastic optimization problem for deep learning within synchronous parallel computing environments under…

Machine Learning · Computer Science 2024-11-07 Yoni Choukroun , Shlomi Azoulay , Pavel Kisilev

FedBCD:Communication-Efficient Accelerated Block Coordinate Gradient Descent for Federated Learning

Although Federated Learning has been widely studied in recent years, there are still high overhead expenses in each communication round for large-scale models such as Vision Transformer. To lower the communication complexity, we propose a…

Machine Learning · Computer Science 2026-04-21 Junkang Liu , Fanhua Shang , Yuanyuan Liu , Hongying Liu , Yuangang Li , YunXiang Gong

Communication-Efficient Adaptive Batch Size Strategies for Distributed Local Gradient Methods

Modern deep neural networks often require distributed training with many workers due to their large size. As the number of workers increases, communication overheads become the main bottleneck in data-parallel minibatch stochastic gradient…

Machine Learning · Statistics 2024-11-07 Tim Tsz-Kit Lau , Weijian Li , Chenwei Xu , Han Liu , Mladen Kolar

Communication Efficiency Optimization of Federated Learning for Computing and Network Convergence of 6G Networks

Federated learning effectively addresses issues such as data privacy by collaborating across participating devices to train global models. However, factors such as network topology and device computing power can affect its training or…

Machine Learning · Computer Science 2023-11-29 Yizhuo Cai , Bo Lei , Qianying Zhao , Jing Peng , Min Wei , Yushun Zhang , Xing Zhang

ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training

Federated learning is a powerful distributed learning scheme that allows numerous edge devices to collaboratively train a model without sharing their data. However, training is resource-intensive for edge devices, and limited network…

Machine Learning · Computer Science 2024-10-25 Hui-Po Wang , Sebastian U. Stich , Yang He , Mario Fritz

Communication-Efficient Approximate Gradient Coding for Distributed Learning in Heterogeneous Systems

We propose a communication-efficient optimally structured gradient coding scheme to jointly address straggler resilience and communication efficiency in heterogeneous distributed learning. By establishing a unified framework that…

Systems and Control · Electrical Eng. & Systems 2026-05-18 Heekang Song , Wan Choi

Communication-Efficient Learning of Deep Networks from Decentralized Data

Modern mobile devices have access to a wealth of data suitable for learning models, which in turn can greatly improve the user experience on the device. For example, language models can improve speech recognition and text entry, and image…

Machine Learning · Computer Science 2023-01-30 H. Brendan McMahan , Eider Moore , Daniel Ramage , Seth Hampson , Blaise Agüera y Arcas

Federated learning over physical channels: adaptive algorithms with near-optimal guarantees

In federated learning, communication cost can be significantly reduced by transmitting the information over the air through physical channels. In this paper, we propose a new class of adaptive federated stochastic gradient descent (SGD)…

Machine Learning · Computer Science 2025-09-03 Rui Zhang , Wenlong Mou

Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates

When scaling distributed training, the communication overhead is often the bottleneck. In this paper, we propose a novel SGD variant with reduced communication and adaptive learning rates. We prove the convergence of the proposed algorithm…

Machine Learning · Computer Science 2020-12-08 Cong Xie , Oluwasanmi Koyejo , Indranil Gupta , Haibin Lin