Related papers: Nested Dithered Quantization for Communication Red…

DQ-SGD: Dynamic Quantization in SGD for Communication-Efficient Distributed Learning

Gradient quantization is an emerging technique in reducing communication costs in distributed learning. Existing gradient quantization algorithms often rely on engineering heuristics or empirical observations, lacking a systematic approach…

Machine Learning · Computer Science 2021-08-02 Guangfeng Yan , Shao-Lun Huang , Tian Lan , Linqi Song

QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding

Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to excellent scalability properties of this algorithm, and to its efficiency in the context of training deep neural networks.…

Machine Learning · Computer Science 2017-12-07 Dan Alistarh , Demjan Grubic , Jerry Li , Ryota Tomioka , Milan Vojnovic

A Distributed Training Algorithm of Generative Adversarial Networks with Quantized Gradients

Training generative adversarial networks (GAN) in a distributed fashion is a promising technology since it is contributed to training GAN on a massive of data efficiently in real-world applications. However, GAN is known to be difficult to…

Machine Learning · Computer Science 2020-10-27 Xiaojun Chen , Shu Yang , Li Shen , Xuanrong Pang

Truncated Non-Uniform Quantization for Distributed SGD

To address the communication bottleneck challenge in distributed learning, our work introduces a novel two-stage quantization strategy designed to enhance the communication efficiency of distributed Stochastic Gradient Descent (SGD). The…

Machine Learning · Computer Science 2024-02-05 Guangfeng Yan , Tan Li , Yuanzhang Xiao , Congduan Li , Linqi Song

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

Large-scale distributed training requires significant communication bandwidth for gradient exchange that limits the scalability of multi-node training, and requires expensive high-bandwidth network infrastructure. The situation gets even…

Computer Vision and Pattern Recognition · Computer Science 2020-06-24 Yujun Lin , Song Han , Huizi Mao , Yu Wang , William J. Dally

Nested Distributed Gradient Methods with Adaptive Quantized Communication

In this paper, we consider minimizing a sum of local convex objective functions in a distributed setting, where communication can be costly. We propose and analyze a class of nested distributed gradient methods with adaptive quantized…

Optimization and Control · Mathematics 2019-08-28 Albert S. Berahas , Charikleia Iakovidou , Ermin Wei

Quantized Epoch-SGD for Communication-Efficient Distributed Learning

Due to its efficiency and ease to implement, stochastic gradient descent (SGD) has been widely used in machine learning. In particular, SGD is one of the most popular optimization methods for distributed learning. Recently, quantized SGD…

Machine Learning · Computer Science 2019-01-11 Shen-Yi Zhao , Hao Gao , Wu-Jun Li

DQT: Dynamic Quantization Training via Dequantization-Free Nested Integer Arithmetic

The deployment of deep neural networks on resource-constrained devices relies on quantization. While static, uniform quantization applies a fixed bit-width to all inputs, it fails to adapt to their varying complexity. Dynamic,…

Machine Learning · Computer Science 2026-03-24 Hazem Hesham Yousef Shalby , Fabrizio Pittorino , Francesca Palermo , Diana Trojaniello , Manuel Roveri

DaSGD: Squeezing SGD Parallelization Performance in Distributed Training Using Delayed Averaging

The state-of-the-art deep learning algorithms rely on distributed training systems to tackle the increasing sizes of models and training data sets. Minibatch stochastic gradient descent (SGD) algorithm requires workers to halt forward/back…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-02 Qinggang Zhou , Yawen Zhang , Pengcheng Li , Xiaoyong Liu , Jun Yang , Runsheng Wang , Ru Huang

Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization

Large-scale distributed optimization is of great importance in various applications. For data-parallel based distributed learning, the inter-node gradient communication often becomes the performance bottleneck. In this paper, we propose the…

Computer Vision and Pattern Recognition · Computer Science 2018-06-22 Jiaxiang Wu , Weidong Huang , Junzhou Huang , Tong Zhang

Convergence Theory of Generalized Distributed Subgradient Method with Random Quantization

The distributed subgradient method (DSG) is a widely discussed algorithm to cope with large-scale distributed optimization problems in the arising machine learning applications. Most exisiting works on DSG focus on ideal communication…

Signal Processing · Electrical Eng. & Systems 2022-08-24 Zhaoyue Xia , Jun Du , Yong Ren

Layered SGD: A Decentralized and Synchronous SGD Algorithm for Scalable Deep Neural Network Training

Stochastic Gradient Descent (SGD) is the most popular algorithm for training deep neural networks (DNNs). As larger networks and datasets cause longer training times, training on distributed systems is common and distributed SGD variants,…

Machine Learning · Computer Science 2019-06-17 Kwangmin Yu , Thomas Flynn , Shinjae Yoo , Nicholas D'Imperio

Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training

Distributed full-graph training of Graph Neural Networks (GNNs) over large graphs is bandwidth-demanding and time-consuming. Frequent exchanges of node features, embeddings and embedding gradients (all referred to as messages) across…

Machine Learning · Computer Science 2023-06-05 Borui Wan , Juntao Zhao , Chuan Wu

vqSGD: Vector Quantized Stochastic Gradient Descent

In this work, we present a family of vector quantization schemes \emph{vqSGD} (Vector-Quantized Stochastic Gradient Descent) that provide an asymptotic reduction in the communication cost with convergence guarantees in first-order…

Machine Learning · Computer Science 2020-12-29 Venkata Gandikota , Daniel Kane , Raj Kumar Maity , Arya Mazumdar

A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks

In distributed training of deep neural networks, people usually run Stochastic Gradient Descent (SGD) or its variants on each machine and communicate with other machines periodically. However, SGD might converge slowly in training some deep…

Machine Learning · Computer Science 2022-10-14 Mingrui Liu , Zhenxun Zhuang , Yunwei Lei , Chunyang Liao

Differentially Quantized Gradient Methods

Consider the following distributed optimization scenario. A worker has access to training data that it uses to compute the gradients while a server decides when to stop iterative computation based on its target accuracy or delay…

Machine Learning · Computer Science 2022-04-28 Chung-Yi Lin , Victoria Kostina , Babak Hassibi

Shuffle-Exchange Brings Faster: Reduce the Idle Time During Communication for Decentralized Neural Network Training

As a crucial scheme to accelerate the deep neural network (DNN) training, distributed stochastic gradient descent (DSGD) is widely adopted in many real-world applications. In most distributed deep learning (DL) frameworks, DSGD is…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-17 Xiang Yang

NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization

As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression…

Machine Learning · Computer Science 2021-05-05 Ali Ramezani-Kebrya , Fartash Faghri , Ilya Markov , Vitalii Aksenov , Dan Alistarh , Daniel M. Roy

NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization

As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression…

Machine Learning · Computer Science 2021-05-24 Ali Ramezani-Kebrya , Fartash Faghri , Ilya Markov , Vitalii Aksenov , Dan Alistarh , Daniel M. Roy

Communication-Censored Distributed Stochastic Gradient Descent

This paper develops a communication-efficient algorithm to solve the stochastic optimization problem defined over a distributed network, aiming at reducing the burdensome communication in applications such as distributed machine…

Machine Learning · Statistics 2020-01-06 Weiyu Li , Tianyi Chen , Liping Li , Zhaoxian Wu , Qing Ling