Related papers: Quantized Epoch-SGD for Communication-Efficient Di…

QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding

Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to excellent scalability properties of this algorithm, and to its efficiency in the context of training deep neural networks.…

Machine Learning · Computer Science 2017-12-07 Dan Alistarh , Demjan Grubic , Jerry Li , Ryota Tomioka , Milan Vojnovic

Trustworthy Efficient Communication for Distributed Learning using LQ-SGD Algorithm

We propose LQ-SGD (Low-Rank Quantized Stochastic Gradient Descent), an efficient communication gradient compression algorithm designed for distributed training. LQ-SGD further develops on the basis of PowerSGD by incorporating the low-rank…

Machine Learning · Computer Science 2025-06-24 Hongyang Li , Lincen Bai , Caesar Wu , Mohammed Chadli , Said Mammar , Pascal Bouvry

DQ-SGD: Dynamic Quantization in SGD for Communication-Efficient Distributed Learning

Gradient quantization is an emerging technique in reducing communication costs in distributed learning. Existing gradient quantization algorithms often rely on engineering heuristics or empirical observations, lacking a systematic approach…

Machine Learning · Computer Science 2021-08-02 Guangfeng Yan , Shao-Lun Huang , Tian Lan , Linqi Song

Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization

Large-scale distributed optimization is of great importance in various applications. For data-parallel based distributed learning, the inter-node gradient communication often becomes the performance bottleneck. In this paper, we propose the…

Computer Vision and Pattern Recognition · Computer Science 2018-06-22 Jiaxiang Wu , Weidong Huang , Junzhou Huang , Tong Zhang

NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization

As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression…

Machine Learning · Computer Science 2021-05-05 Ali Ramezani-Kebrya , Fartash Faghri , Ilya Markov , Vitalii Aksenov , Dan Alistarh , Daniel M. Roy

NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization

As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression…

Machine Learning · Computer Science 2021-05-24 Ali Ramezani-Kebrya , Fartash Faghri , Ilya Markov , Vitalii Aksenov , Dan Alistarh , Daniel M. Roy

Adaptive Periodic Averaging: A Practical Approach to Reducing Communication in Distributed Learning

Stochastic Gradient Descent (SGD) is the key learning algorithm for many machine learning tasks. Because of its computational costs, there is a growing interest in accelerating SGD on HPC resources like GPU clusters. However, the…

Machine Learning · Computer Science 2021-01-20 Peng Jiang , Gagan Agrawal

Communication-Efficient Distributed SGD with Compressed Sensing

We consider large scale distributed optimization over a set of edge devices connected to a central server, where the limited communication bandwidth between the server and edge devices imposes a significant bottleneck for the optimization…

Optimization and Control · Mathematics 2021-12-28 Yujie Tang , Vikram Ramanathan , Junshan Zhang , Na Li

Truncated Non-Uniform Quantization for Distributed SGD

To address the communication bottleneck challenge in distributed learning, our work introduces a novel two-stage quantization strategy designed to enhance the communication efficiency of distributed Stochastic Gradient Descent (SGD). The…

Machine Learning · Computer Science 2024-02-05 Guangfeng Yan , Tan Li , Yuanzhang Xiao , Congduan Li , Linqi Song

Communication-Censored Distributed Stochastic Gradient Descent

This paper develops a communication-efficient algorithm to solve the stochastic optimization problem defined over a distributed network, aiming at reducing the burdensome communication in applications such as distributed machine…

Machine Learning · Statistics 2020-01-06 Weiyu Li , Tianyi Chen , Liping Li , Zhaoxian Wu , Qing Ling

Faster Distributed Deep Net Training: Computation and Communication Decoupled Stochastic Gradient Descent

With the increase in the amount of data and the expansion of model scale, distributed parallel training becomes an important and successful technique to address the optimization challenges. Nevertheless, although distributed stochastic…

Machine Learning · Computer Science 2019-09-23 Shuheng Shen , Linli Xu , Jingchang Liu , Xianfeng Liang , Yifei Cheng

On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization

Recent developments on large-scale distributed machine learning applications, e.g., deep neural networks, benefit enormously from the advances in distributed non-convex optimization techniques, e.g., distributed Stochastic Gradient Descent…

Optimization and Control · Mathematics 2019-05-13 Hao Yu , Rong Jin , Sen Yang

On the Convergence of Quantized Parallel Restarted SGD for Central Server Free Distributed Training

Communication is a crucial phase in the context of distributed training. Because parameter server (PS) frequently experiences network congestion, recent studies have found that training paradigms without a centralized server outperform the…

Optimization and Control · Mathematics 2020-12-17 Feijie Wu , Shiqi He , Yutong Yang , Haozhao Wang , Zhihao Qu , Song Guo , Weihua Zhuang

Layered SGD: A Decentralized and Synchronous SGD Algorithm for Scalable Deep Neural Network Training

Stochastic Gradient Descent (SGD) is the most popular algorithm for training deep neural networks (DNNs). As larger networks and datasets cause longer training times, training on distributed systems is common and distributed SGD variants,…

Machine Learning · Computer Science 2019-06-17 Kwangmin Yu , Thomas Flynn , Shinjae Yoo , Nicholas D'Imperio

Sparse Communication for Training Deep Networks

Synchronous stochastic gradient descent (SGD) is the most common method used for distributed training of deep learning models. In this algorithm, each worker shares its local gradients with others and updates the parameters using the…

Machine Learning · Computer Science 2020-09-22 Negar Foroutan Eghlidi , Martin Jaggi

Adaptive Top-K in SGD for Communication-Efficient Distributed Learning

Distributed stochastic gradient descent (SGD) with gradient compression has become a popular communication-efficient solution for accelerating distributed learning. One commonly used method for gradient compression is Top-K sparsification,…

Machine Learning · Computer Science 2023-09-12 Mengzhe Ruan , Guangfeng Yan , Yuanzhang Xiao , Linqi Song , Weitao Xu

FastSGD: A Fast Compressed SGD Framework for Distributed Machine Learning

With the rapid increase of big data, distributed Machine Learning (ML) has been widely applied in training large-scale models. Stochastic Gradient Descent (SGD) is arguably the workhorse algorithm of ML. Distributed ML models trained by SGD…

Machine Learning · Computer Science 2021-12-09 Keyu Yang , Lu Chen , Zhihao Zeng , Yunjun Gao

A Distributed SGD Algorithm with Global Sketching for Deep Learning Training Acceleration

Distributed training is an effective way to accelerate the training process of large-scale deep learning models. However, the parameter exchange and synchronization of distributed stochastic gradient descent introduce a large amount of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-08-16 LingFei Dai , Boyu Diao , Chao Li , Yongjun Xu

Quantize Once, Train Fast: Allreduce-Compatible Compression with Provable Guarantees

Distributed training enables large-scale deep learning, but suffers from high communication overhead, especially as models and datasets grow. Gradient compression, particularly quantization, is a promising approach to mitigate this…

Machine Learning · Computer Science 2025-07-30 Jihao Xin , Marco Canini , Peter Richtárik , Samuel Horváth

Sparsified SGD with Memory

Huge scale machine learning problems are nowadays tackled by distributed optimization algorithms, i.e. algorithms that leverage the compute power of many devices for training. The communication overhead is a key bottleneck that hinders…

Machine Learning · Computer Science 2018-11-30 Sebastian U. Stich , Jean-Baptiste Cordonnier , Martin Jaggi