Related papers: Communication-Efficient Distributed SGD using Prea…

Communication-Efficient Distributed SGD with Compressed Sensing

We consider large scale distributed optimization over a set of edge devices connected to a central server, where the limited communication bandwidth between the server and edge devices imposes a significant bottleneck for the optimization…

Optimization and Control · Mathematics 2021-12-28 Yujie Tang , Vikram Ramanathan , Junshan Zhang , Na Li

Decentralized Learning over Wireless Networks with Broadcast-Based Subgraph Sampling

This work centers on the communication aspects of decentralized learning over wireless networks, using consensus-based decentralized stochastic gradient descent (D-SGD). Considering the actual communication cost or delay caused by…

Machine Learning · Computer Science 2023-10-26 Daniel Pérez Herrera , Zheng Chen , Erik G. Larsson

Decentralized Learning over Wireless Networks: The Effect of Broadcast with Random Access

In this work, we focus on the communication aspect of decentralized learning, which involves multiple agents training a shared machine learning model using decentralized stochastic gradient descent (D-SGD) over distributed data. In…

Networking and Internet Architecture · Computer Science 2023-07-10 Zheng Chen , Martin Dahl , Erik G. Larsson

Communication-Censored Distributed Stochastic Gradient Descent

This paper develops a communication-efficient algorithm to solve the stochastic optimization problem defined over a distributed network, aiming at reducing the burdensome communication in applications such as distributed machine…

Machine Learning · Statistics 2020-01-06 Weiyu Li , Tianyi Chen , Liping Li , Zhaoxian Wu , Qing Ling

A Hybrid-Order Distributed SGD Method for Non-Convex Optimization to Balance Communication Overhead, Computational Complexity, and Convergence Rate

In this paper, we propose a method of distributed stochastic gradient descent (SGD), with low communication load and computational complexity, and still fast convergence. To reduce the communication load, at each iteration of the algorithm,…

Machine Learning · Computer Science 2020-03-30 Naeimeh Omidvar , Mohammad Ali Maddah-Ali , Hamed Mahdavi

Faster Convergence with Less Communication: Broadcast-Based Subgraph Sampling for Decentralized Learning over Wireless Networks

Consensus-based decentralized stochastic gradient descent (D-SGD) is a widely adopted algorithm for decentralized training of machine learning models across networked agents. A crucial part of D-SGD is the consensus-based model averaging,…

Information Theory · Computer Science 2025-02-12 Daniel Pérez Herrera , Zheng Chen , Erik G. Larsson

Adaptive Stochastic Gradient Descent for Fast and Communication-Efficient Distributed Learning

We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on $n$ workers, each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or…

Machine Learning · Computer Science 2022-08-08 Serge Kas Hanna , Rawad Bitar , Parimal Parag , Venkat Dasari , Salim El Rouayheb

Scaling up Stochastic Gradient Descent for Non-convex Optimisation

Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions…

Machine Learning · Statistics 2022-10-07 Saad Mohamad , Hamad Alamri , Abdelhamid Bouchachia

Local Stochastic Gradient Descent Ascent: Convergence Analysis and Communication Efficiency

Local SGD is a promising approach to overcome the communication overhead in distributed learning by reducing the synchronization frequency among worker nodes. Despite the recent theoretical advances of local SGD in empirical risk…

Machine Learning · Computer Science 2021-03-01 Yuyang Deng , Mehrdad Mahdavi

A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks

In distributed training of deep neural networks, people usually run Stochastic Gradient Descent (SGD) or its variants on each machine and communicate with other machines periodically. However, SGD might converge slowly in training some deep…

Machine Learning · Computer Science 2022-10-14 Mingrui Liu , Zhenxun Zhuang , Yunwei Lei , Chunyang Liao

Local SGD Converges Fast and Communicates Little

Mini-batch stochastic gradient descent (SGD) is state of the art in large scale distributed training. The scheme can reach a linear speedup with respect to the number of workers, but this is rarely seen in practice as the scheme often…

Optimization and Control · Mathematics 2019-05-06 Sebastian U. Stich

Avoiding Communication in Logistic Regression

Stochastic gradient descent (SGD) is one of the most widely used optimization methods for solving various machine learning problems. SGD solves an optimization problem by iteratively sampling a few data points from the input data, computing…

Machine Learning · Computer Science 2020-11-18 Aditya Devarakonda , James Demmel

On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization

Recent developments on large-scale distributed machine learning applications, e.g., deep neural networks, benefit enormously from the advances in distributed non-convex optimization techniques, e.g., distributed Stochastic Gradient Descent…

Optimization and Control · Mathematics 2019-05-13 Hao Yu , Rong Jin , Sen Yang

Communication-efficient distributed SGD with Sketching

Large-scale distributed training of neural networks is often limited by network bandwidth, wherein the communication time overwhelms the local computation time. Motivated by the success of sketching methods in sub-linear/streaming…

Machine Learning · Computer Science 2020-01-24 Nikita Ivkin , Daniel Rothchild , Enayat Ullah , Vladimir Braverman , Ion Stoica , Raman Arora

Communication-efficient SGD: From Local SGD to One-Shot Averaging

We consider speeding up stochastic gradient descent (SGD) by parallelizing it across multiple workers. We assume the same data set is shared among $N$ workers, who can take SGD steps and coordinate with a central server. While it is…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-28 Artin Spiridonoff , Alex Olshevsky , Ioannis Ch. Paschalidis

cpSGD: Communication-efficient and differentially-private distributed SGD

Distributed stochastic gradient descent is an important subroutine in distributed learning. A setting of particular interest is when the clients are mobile devices, where two important concerns are communication efficiency and the privacy…

Machine Learning · Statistics 2018-05-29 Naman Agarwal , Ananda Theertha Suresh , Felix Yu , Sanjiv Kumar , H. Brendan Mcmahan

S-NEAR-DGD: A Flexible Distributed Stochastic Gradient Method for Inexact Communication

We present and analyze a stochastic distributed method (S-NEAR-DGD) that can tolerate inexact computation and inaccurate information exchange to alleviate the problems of costly gradient evaluations and bandwidth-limited communication in…

Optimization and Control · Mathematics 2021-02-02 Charikleia Iakovidou , Ermin Wei

Communication-Efficient Distributed Learning via Sparse and Adaptive Stochastic Gradient

Gradient-based optimization methods implemented on distributed computing architectures are increasingly used to tackle large-scale machine learning applications. A key bottleneck in such distributed systems is the high communication…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-11 Xiaoge Deng , Dongsheng Li , Tao Sun , Xicheng Lu

Compressed Distributed Gradient Descent: Communication-Efficient Consensus over Networks

Network consensus optimization has received increasing attention in recent years and has found important applications in many scientific and engineering fields. To solve network consensus optimization problems, one of the most well-known…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-10 Xin Zhang , Jia Liu , Zhengyuan Zhu , Elizabeth S. Bentley

Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air

We study federated machine learning (ML) at the wireless edge, where power- and bandwidth-limited wireless devices with local datasets carry out distributed stochastic gradient descent (DSGD) with the help of a remote parameter server (PS).…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-08 Mohammad Mohammadi Amiri , Deniz Gunduz