Related papers: Avoiding Communication in Logistic Regression

A Hybrid-Order Distributed SGD Method for Non-Convex Optimization to Balance Communication Overhead, Computational Complexity, and Convergence Rate

In this paper, we propose a method of distributed stochastic gradient descent (SGD), with low communication load and computational complexity, and still fast convergence. To reduce the communication load, at each iteration of the algorithm,…

Machine Learning · Computer Science 2020-03-30 Naeimeh Omidvar , Mohammad Ali Maddah-Ali , Hamed Mahdavi

Balancing the Communication Load of Asynchronously Parallelized Machine Learning Algorithms

Stochastic Gradient Descent (SGD) is the standard numerical method used to solve the core optimization problem for the vast majority of machine learning (ML) algorithms. In the context of large scale learning, as utilized by many Big Data…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-06 Janis Keuper , Franz-Josef Pfreundt

Communication-Censored Distributed Stochastic Gradient Descent

This paper develops a communication-efficient algorithm to solve the stochastic optimization problem defined over a distributed network, aiming at reducing the burdensome communication in applications such as distributed machine…

Machine Learning · Statistics 2020-01-06 Weiyu Li , Tianyi Chen , Liping Li , Zhaoxian Wu , Qing Ling

Communication-efficient SGD: From Local SGD to One-Shot Averaging

We consider speeding up stochastic gradient descent (SGD) by parallelizing it across multiple workers. We assume the same data set is shared among $N$ workers, who can take SGD steps and coordinate with a central server. While it is…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-28 Artin Spiridonoff , Alex Olshevsky , Ioannis Ch. Paschalidis

Adaptive Periodic Averaging: A Practical Approach to Reducing Communication in Distributed Learning

Stochastic Gradient Descent (SGD) is the key learning algorithm for many machine learning tasks. Because of its computational costs, there is a growing interest in accelerating SGD on HPC resources like GPU clusters. However, the…

Machine Learning · Computer Science 2021-01-20 Peng Jiang , Gagan Agrawal

Escaping Saddle Points with Compressed SGD

Stochastic gradient descent (SGD) is a prevalent optimization technique for large-scale distributed machine learning. While SGD computation can be efficiently divided between multiple machines, communication typically becomes a bottleneck…

Machine Learning · Computer Science 2021-05-24 Dmitrii Avdiukhin , Grigory Yaroslavtsev

Scaling up Stochastic Gradient Descent for Non-convex Optimisation

Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions…

Machine Learning · Statistics 2022-10-07 Saad Mohamad , Hamad Alamri , Abdelhamid Bouchachia

Adaptive Top-K in SGD for Communication-Efficient Distributed Learning

Distributed stochastic gradient descent (SGD) with gradient compression has become a popular communication-efficient solution for accelerating distributed learning. One commonly used method for gradient compression is Top-K sparsification,…

Machine Learning · Computer Science 2023-09-12 Mengzhe Ruan , Guangfeng Yan , Yuanzhang Xiao , Linqi Song , Weitao Xu

Local SGD Converges Fast and Communicates Little

Mini-batch stochastic gradient descent (SGD) is state of the art in large scale distributed training. The scheme can reach a linear speedup with respect to the number of workers, but this is rarely seen in practice as the scheme often…

Optimization and Control · Mathematics 2019-05-06 Sebastian U. Stich

Local Stochastic Gradient Descent Ascent: Convergence Analysis and Communication Efficiency

Local SGD is a promising approach to overcome the communication overhead in distributed learning by reducing the synchronization frequency among worker nodes. Despite the recent theoretical advances of local SGD in empirical risk…

Machine Learning · Computer Science 2021-03-01 Yuyang Deng , Mehrdad Mahdavi

CADA: Communication-Adaptive Distributed Adam

Stochastic gradient descent (SGD) has taken the stage as the primary workhorse for large-scale machine learning. It is often used with its adaptive variants such as AdaGrad, Adam, and AMSGrad. This paper proposes an adaptive stochastic…

Machine Learning · Computer Science 2021-01-01 Tianyi Chen , Ziye Guo , Yuejiao Sun , Wotao Yin

Stochastic gradient with least-squares control variates

The stochastic gradient descent (SGD) method is a widely used approach for solving stochastic optimization problems, but its convergence is typically slow. Existing variance reduction techniques, such as SAGA, improve convergence by…

Optimization and Control · Mathematics 2025-11-21 Fabio Nobile , Matteo Raviola , Nathan Schaeffer

A Novel Stochastic Stratified Average Gradient Method: Convergence Rate and Its Complexity

SGD (Stochastic Gradient Descent) is a popular algorithm for large scale optimization problems due to its low iterative cost. However, SGD can not achieve linear convergence rate as FGD (Full Gradient Descent) because of the inherent…

Machine Learning · Computer Science 2017-12-05 Aixiang Chen , Bingchuan Chen , Xiaolong Chai , Rui Bian , Hengguang Li

Communication-Efficient, 2D Parallel Stochastic Gradient Descent for Distributed-Memory Optimization

Distributed-memory implementations of numerical optimization algorithm, such as stochastic gradient descent (SGD), require interprocessor communication at every iteration of the algorithm. On modern distributed-memory clusters where…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-14 Aditya Devarakonda , Ramakrishnan Kannan

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(\log(T)/T), by running SGD for…

Machine Learning · Computer Science 2015-03-19 Alexander Rakhlin , Ohad Shamir , Karthik Sridharan

Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD

Distributed stochastic gradient descent (SGD) is essential for scaling the machine learning algorithms to a large number of computing nodes. However, the infrastructures variability such as high communication delay or random node slowdown…

Machine Learning · Computer Science 2020-02-25 Jianyu Wang , Hao Liang , Gauri Joshi

Stochastic Gradient Descent Meets Distribution Regression

Stochastic gradient descent (SGD) provides a simple and efficient way to solve a broad range of machine learning problems. Here, we focus on distribution regression (DR), involving two stages of sampling: Firstly, we regress from…

Machine Learning · Statistics 2021-03-08 Nicole Mücke

A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks

In distributed training of deep neural networks, people usually run Stochastic Gradient Descent (SGD) or its variants on each machine and communicate with other machines periodically. However, SGD might converge slowly in training some deep…

Machine Learning · Computer Science 2022-10-14 Mingrui Liu , Zhenxun Zhuang , Yunwei Lei , Chunyang Liao

Gradient and Variable Tracking with Multiple Local SGD for Decentralized Non-Convex Learning

Stochastic distributed optimization methods that solve an optimization problem over a multi-agent network have played an important role in a variety of large-scale signal processing and machine leaning applications. Among the existing…

Optimization and Control · Mathematics 2023-02-06 Songyang Ge , Tsung-Hui Chang

Sparsified SGD with Memory

Huge scale machine learning problems are nowadays tackled by distributed optimization algorithms, i.e. algorithms that leverage the compute power of many devices for training. The communication overhead is a key bottleneck that hinders…

Machine Learning · Computer Science 2018-11-30 Sebastian U. Stich , Jean-Baptiste Cordonnier , Martin Jaggi