Related papers: SYNTHESIS: A Semi-Asynchronous Path-Integrated Sto…

Asynchronous Optimization Methods for Efficient Training of Deep Neural Networks with Guarantees

Asynchronous distributed algorithms are a popular way to reduce synchronization costs in large-scale optimization, and in particular for neural network training. However, for nonsmooth and nonconvex objectives, few convergence guarantees…

Optimization and Control · Mathematics 2020-07-14 Vyacheslav Kungurtsev , Malcolm Egan , Bapi Chatterjee , Dan Alistarh

Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning

We study the asynchronous stochastic gradient descent algorithm for distributed training over $n$ workers which have varying computation and communication frequency over time. In this algorithm, workers compute stochastic gradients in…

Machine Learning · Computer Science 2022-06-17 Anastasia Koloskova , Sebastian U. Stich , Martin Jaggi

Asynchronous Distributed Semi-Stochastic Gradient Optimization

With the recent proliferation of large-scale learning problems,there have been a lot of interest on distributed machine learning algorithms, particularly those that are based on stochastic gradient descent (SGD) and its variants. However,…

Machine Learning · Computer Science 2015-12-07 Ruiliang Zhang , Shuai Zheng , James T. Kwok

Distributed Stochastic Gradient Descent with Staleness: A Stochastic Delay Differential Equation Based Framework

Distributed stochastic gradient descent (SGD) has attracted considerable recent attention due to its potential for scaling computational resources, reducing training time, and helping protect user privacy in machine learning. However, the…

Machine Learning · Computer Science 2025-02-27 Siyuan Yu , Wei Chen , H. Vincent Poor

High Throughput Synchronous Distributed Stochastic Gradient Descent

We introduce a new, high-throughput, synchronous, distributed, data-parallel, stochastic-gradient-descent learning algorithm. This algorithm uses amortized inference in a compute-cluster-specific, deep, generative, dynamical model to…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-03-14 Michael Teng , Frank Wood

Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization

Asynchronous parallel implementations of stochastic gradient (SG) have been broadly used in solving deep neural network and received many successes in practice recently. However, existing theories cannot explain their convergence and…

Optimization and Control · Mathematics 2019-04-22 Xiangru Lian , Yijun Huang , Yuncheng Li , Ji Liu

OD-SGD: One-step Delay Stochastic Gradient Descent for Distributed Training

The training of modern deep learning neural network calls for large amounts of computation, which is often provided by GPUs or other specific accelerators. To scale out to achieve faster training speed, two update algorithms are mainly…

Machine Learning · Computer Science 2020-05-15 Yemao Xu , Dezun Dong , Weixia Xu , Xiangke Liao

Distributed learning with compressed gradients

Asynchronous computation and gradient compression have emerged as two key techniques for achieving scalability in distributed optimization for large-scale machine learning. This paper presents a unified analysis framework for distributed…

Optimization and Control · Mathematics 2018-11-30 Sarit Khirirat , Hamid Reza Feyzmahdavian , Mikael Johansson

Asynchronous Stochastic Optimization Robust to Arbitrary Delays

We consider stochastic optimization with delayed gradients where, at each time step $t$, the algorithm makes an update using a stale stochastic gradient from step $t - d_t$ for some arbitrary delay $d_t$. This setting abstracts asynchronous…

Optimization and Control · Mathematics 2021-11-16 Alon Cohen , Amit Daniely , Yoel Drori , Tomer Koren , Mariano Schain

Scaling up Stochastic Gradient Descent for Non-convex Optimisation

Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions…

Machine Learning · Statistics 2022-10-07 Saad Mohamad , Hamad Alamri , Abdelhamid Bouchachia

Distributed Stochastic Non-Convex Optimization: Momentum-Based Variance Reduction

In this work, we propose a distributed algorithm for stochastic non-convex optimization. We consider a worker-server architecture where a set of $K$ worker nodes (WNs) in collaboration with a server node (SN) jointly aim to minimize a…

Optimization and Control · Mathematics 2020-05-04 Prashant Khanduri , Pranay Sharma , Swatantra Kafle , Saikiran Bulusu , Ketan Rajawat , Pramod K. Varshney

Staleness-aware Async-SGD for Distributed Deep Learning

Deep neural networks have been shown to achieve state-of-the-art performance in several machine learning tasks. Stochastic Gradient Descent (SGD) is the preferred optimization algorithm for training these networks and asynchronous SGD…

Machine Learning · Computer Science 2016-04-06 Wei Zhang , Suyog Gupta , Xiangru Lian , Ji Liu

Stochastic Subgradient Algorithms for Strongly Convex Optimization over Distributed Networks

We study diffusion and consensus based optimization of a sum of unknown convex objective functions over distributed networks. The only access to these functions is through stochastic gradient oracles, each of which is only available at a…

Numerical Analysis · Computer Science 2015-09-01 N. Denizcan Vanli , Muhammed O. Sayin , Suleyman S. Kozat

An Asynchronous Mini-Batch Algorithm for Regularized Stochastic Optimization

Mini-batch optimization has proven to be a powerful paradigm for large-scale learning. However, the state of the art parallel mini-batch algorithms assume synchronous operation or cyclic update orders. When worker nodes are heterogeneous…

Optimization and Control · Mathematics 2015-05-20 Hamid Reza Feyzmahdavian , Arda Aytekin , Mikael Johansson

MindTheStep-AsyncPSGD: Adaptive Asynchronous Parallel Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) is very useful in optimization problems with high-dimensional non-convex target functions, and hence constitutes an important component of several Machine Learning and Data Analytics methods. Recently there…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-11 Karl Bäckström , Marina Papatriantafilou , Philippas Tsigas

A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks

In distributed training of deep neural networks, people usually run Stochastic Gradient Descent (SGD) or its variants on each machine and communicate with other machines periodically. However, SGD might converge slowly in training some deep…

Machine Learning · Computer Science 2022-10-14 Mingrui Liu , Zhenxun Zhuang , Yunwei Lei , Chunyang Liao

Distributed Delayed Stochastic Optimization

We analyze the convergence of gradient-based optimization algorithms that base their updates on delayed stochastic gradient information. The main application of our results is to the development of gradient-based distributed optimization…

Optimization and Control · Mathematics 2011-05-02 Alekh Agarwal , John C. Duchi

New Penalized Stochastic Gradient Methods for Linearly Constrained Strongly Convex Optimization

For minimizing a strongly convex objective function subject to linear inequality constraints, we consider a penalty approach that allows one to utilize stochastic methods for problems with a large number of constraints and/or objective…

Optimization and Control · Mathematics 2022-02-16 Meng Li , Paul Grigas , Alper Atamturk

Stochastic Gradient Made Stable: A Manifold Propagation Approach for Large-Scale Optimization

Stochastic gradient descent (SGD) holds as a classical method to build large scale machine learning models over big data. A stochastic gradient is typically calculated from a limited number of samples (known as mini-batch), so it…

Machine Learning · Computer Science 2016-01-14 Yadong Mu , Wei Liu , Wei Fan

Distributed Optimization and Learning for Automated Stepsize Selection with Finite Time Coordination

Distributed optimization and learning algorithms are designed to operate over large scale networks enabling processing of vast amounts of data effectively and efficiently. One of the main challenges for ensuring a smooth learning process in…

Systems and Control · Electrical Eng. & Systems 2026-01-21 Apostolos I. Rikos , Nicola Bastianello , Themistoklis Charalambous , Karl H. Johansson