Related papers: Quantizing data for distributed learning

Variance-based Gradient Compression for Efficient Distributed Deep Learning

Due to the substantial computational cost, training state-of-the-art deep neural networks for large-scale datasets often requires distributed training using multiple computation workers. However, by nature, workers need to frequently…

Machine Learning · Computer Science 2018-02-21 Yusuke Tsuzuku , Hiroto Imachi , Takuya Akiba

Distributed Learning with Compressed Gradient Differences

Training large machine learning models requires a distributed computing approach, with communication of the model updates being the bottleneck. For this reason, several methods based on the compression (e.g., sparsification and/or…

Machine Learning · Computer Science 2023-12-29 Konstantin Mishchenko , Eduard Gorbunov , Martin Takáč , Peter Richtárik

Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization

Large-scale distributed optimization is of great importance in various applications. For data-parallel based distributed learning, the inter-node gradient communication often becomes the performance bottleneck. In this paper, we propose the…

Computer Vision and Pattern Recognition · Computer Science 2018-06-22 Jiaxiang Wu , Weidong Huang , Junzhou Huang , Tong Zhang

Fast Convergence Rates of Distributed Subgradient Methods with Adaptive Quantization

We study distributed optimization problems over a network when the communication between the nodes is constrained, and so information that is exchanged between the nodes must be quantized. Recent advances using the distributed gradient…

Optimization and Control · Mathematics 2019-05-14 Thinh T. Doan , Siva Theja Maguluri , Justin Romberg

Unbiased Single-scale and Multi-scale Quantizers for Distributed Optimization

Massive amounts of data have led to the training of large-scale machine learning models on a single worker inefficient. Distributed machine learning methods such as Parallel-SGD have received significant interest as a solution to tackle…

Machine Learning · Computer Science 2022-03-31 S Vineeth

Compressed Communication for Distributed Training: Adaptive Methods and System

Communication overhead severely hinders the scalability of distributed machine learning systems. Recently, there has been a growing interest in using gradient compression to reduce the communication overhead of the distributed training.…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-19 Yuchen Zhong , Cong Xie , Shuai Zheng , Haibin Lin

MQGrad: Reinforcement Learning of Gradient Quantization in Parameter Server

One of the most significant bottleneck in training large scale machine learning models on parameter server (PS) is the communication overhead, because it needs to frequently exchange the model gradients between the workers and servers…

Machine Learning · Computer Science 2018-04-25 Guoxin Cui , Jun Xu , Wei Zeng , Yanyan Lan , Jiafeng Guo , Xueqi Cheng

Communication-Efficient Distributed Learning via Lazily Aggregated Quantized Gradients

The present paper develops a novel aggregated gradient approach for distributed machine learning that adaptively compresses the gradient communication. The key idea is to first quantize the computed gradients, and then skip less informative…

Machine Learning · Computer Science 2019-09-18 Jun Sun , Tianyi Chen , Georgios B. Giannakis , Zaiyue Yang

Distributed Stochastic Approximation for Solving Network Optimization Problems Under Random Quantization

We study distributed optimization problems over a network when the communication between the nodes is constrained, and so information that is exchanged between the nodes must be quantized. This imperfect communication poses a fundamental…

Optimization and Control · Mathematics 2018-10-30 Thinh T. Doan , Siva Theja Maguluri , Justin Romberg

Optimal Gradient Quantization Condition for Communication-Efficient Distributed Training

The communication of gradients is costly for training deep neural networks with multiple devices in computer vision applications. In particular, the growing size of deep learning models leads to higher communication overheads that defy the…

Machine Learning · Computer Science 2020-02-26 An Xu , Zhouyuan Huo , Heng Huang

Quantization Design for Distributed Optimization

We consider the problem of solving a distributed optimization problem using a distributed computing platform, where the communication in the network is limited: each node can only communicate with its neighbours and the channel has a…

Systems and Control · Computer Science 2015-04-10 Ye Pu , Melanie N. Zeilinger , Colin N. Jones

QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding

Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to excellent scalability properties of this algorithm, and to its efficiency in the context of training deep neural networks.…

Machine Learning · Computer Science 2017-12-07 Dan Alistarh , Demjan Grubic , Jerry Li , Ryota Tomioka , Milan Vojnovic

Neural Networks with Quantization Constraints

Enabling low precision implementations of deep learning models, without considerable performance degradation, is necessary in resource and latency constrained settings. Moreover, exploiting the differences in sensitivity to quantization…

Machine Learning · Computer Science 2022-10-28 Ignacio Hounie , Juan Elenter , Alejandro Ribeiro

DQ-SGD: Dynamic Quantization in SGD for Communication-Efficient Distributed Learning

Gradient quantization is an emerging technique in reducing communication costs in distributed learning. Existing gradient quantization algorithms often rely on engineering heuristics or empirical observations, lacking a systematic approach…

Machine Learning · Computer Science 2021-08-02 Guangfeng Yan , Shao-Lun Huang , Tian Lan , Linqi Song

Stochastic Distributed Learning with Gradient Quantization and Variance Reduction

We consider distributed optimization where the objective function is spread among different devices, each sending incremental model updates to a central server. To alleviate the communication bottleneck, recent work proposed various schemes…

Optimization and Control · Mathematics 2019-04-11 Samuel Horváth , Dmitry Kovalev , Konstantin Mishchenko , Sebastian Stich , Peter Richtárik

Learned Gradient Compression for Distributed Deep Learning

Training deep neural networks on large datasets containing high-dimensional data requires a large amount of computation. A solution to this problem is data-parallel distributed training, where a model is replicated into several…

Machine Learning · Computer Science 2021-03-18 Lusine Abrahamyan , Yiming Chen , Giannis Bekoulis , Nikos Deligiannis

Quantized Adaptive Subgradient Algorithms and Their Applications

Data explosion and an increase in model size drive the remarkable advances in large-scale machine learning, but also make model training time-consuming and model storage difficult. To address the above issues in the distributed model…

Machine Learning · Computer Science 2022-08-12 Ke Xu , Jianqiao Wangni , Yifan Zhang , Deheng Ye , Jiaxiang Wu , Peilin Zhao

Wyner-Ziv Gradient Compression for Federated Learning

Due to limited communication resources at the client and a massive number of model parameters, large-scale distributed learning tasks suffer from communication bottleneck. Gradient compression is an effective method to reduce communication…

Machine Learning · Computer Science 2021-11-17 Kai Liang , Huiru Zhong , Haoning Chen , Youlong Wu

MetaGrad: Adaptive Gradient Quantization with Hypernetworks

A popular track of network compression approach is Quantization aware Training (QAT), which accelerates the forward pass during the neural network training and inference. However, not much prior efforts have been made to quantize and…

Computer Vision and Pattern Recognition · Computer Science 2023-11-02 Kaixin Xu , Alina Hui Xiu Lee , Ziyuan Zhao , Zhe Wang , Min Wu , Weisi Lin

Improved Quantization Strategies for Managing Heavy-tailed Gradients in Distributed Learning

Gradient compression has surfaced as a key technique to address the challenge of communication efficiency in distributed learning. In distributed deep learning, however, it is observed that gradient distributions are heavy-tailed, with…

Machine Learning · Computer Science 2024-02-07 Guangfeng Yan , Tan Li , Yuanzhang Xiao , Hanxu Hou , Linqi Song