Related papers: Communication-Compressed Adaptive Gradient Method …
We study COMP-AMS, a distributed optimization framework based on gradient averaging and adaptive AMSGrad algorithm. Gradient compression with error feedback is applied to reduce the communication cost in the gradient transmission process.…
Communication overhead severely hinders the scalability of distributed machine learning systems. Recently, there has been a growing interest in using gradient compression to reduce the communication overhead of the distributed training.…
In this paper, we design two compressed decentralized algorithms for solving nonconvex stochastic optimization under two different scenarios. Both algorithms adopt a momentum technique to achieve fast convergence and a message-compression…
A stochastic gradient method for synchronous distributed optimization is studied. For reducing communication cost, we particularly focus on utilization of compression of communicated gradients. Several work has shown that {\it{sparsified}}…
Due to the high communication cost in distributed and federated learning problems, methods relying on compression of communicated messages are becoming increasingly popular. While in other contexts the best performing gradient-type methods…
Due to the substantial computational cost, training state-of-the-art deep neural networks for large-scale datasets often requires distributed training using multiple computation workers. However, by nature, workers need to frequently…
In recent years, distributed optimization is proven to be an effective approach to accelerate training of large scale machine learning models such as deep neural networks. With the increasing computation power of GPUs, the bottleneck of…
We propose Adaptive Compressed Gradient Descent (AdaCGD) - a novel optimization algorithm for communication-efficient training of supervised machine learning models with adaptive compression level. Our approach is inspired by the recently…
Distributed stochastic gradient descent (SGD) with gradient compression has become a popular communication-efficient solution for accelerating distributed learning. One commonly used method for gradient compression is Top-K sparsification,…
Distributed data mining is an emerging research topic to effectively and efficiently address hard data mining tasks using big data, which are partitioned and computed on different worker nodes, instead of one centralized server.…
In this paper, we design a novel distributed learning algorithm using stochastic compressed communications. In detail, we pursue a modular approach, merging ADMM and a gradient-based approach, benefiting from the robustness of the former…
Distributed model training suffers from communication bottlenecks due to frequent model updates transmitted across compute nodes. To alleviate these bottlenecks, practitioners use gradient compression techniques like sparsification,…
Communication overhead is a major bottleneck hampering the scalability of distributed machine learning systems. Recently, there has been a surge of interest in using gradient compression to improve the communication efficiency of…
Adaptive gradient methods including Adam, AdaGrad, and their variants have been very successful for training deep learning models, such as neural networks. Meanwhile, given the need for distributed computing, distributed optimization…
Communication overhead is the key challenge for distributed training. Gradient compression is a widely used approach to reduce communication traffic. When combining with parallel communication mechanism method like pipeline, gradient…
Distributed optimization methods are often applied to solving huge-scale problems like training neural networks with millions and even billions of parameters. In such applications, communicating full vectors, e.g., (stochastic) gradients,…
We propose a new variant of AMSGrad, a popular adaptive gradient based optimization algorithm widely used for training deep neural networks. Our algorithm adds prior knowledge about the sequence of consecutive mini-batch gradients and…
We consider large scale distributed optimization over a set of edge devices connected to a central server, where the limited communication bandwidth between the server and edge devices imposes a significant bottleneck for the optimization…
In this paper, we propose a method of distributed stochastic gradient descent (SGD), with low communication load and computational complexity, and still fast convergence. To reduce the communication load, at each iteration of the algorithm,…
Network consensus optimization has received increasing attention in recent years and has found important applications in many scientific and engineering fields. To solve network consensus optimization problems, one of the most well-known…