Related papers: Advances in Asynchronous Parallel and Distributed …
Asynchronous distributed algorithms are a popular way to reduce synchronization costs in large-scale optimization, and in particular for neural network training. However, for nonsmooth and nonconvex objectives, few convergence guarantees…
Asynchronous parallel optimization received substantial successes and extensive attention recently. One of core theoretical questions is how much speedup (or benefit) the asynchronous parallelization can bring us. This paper provides a…
Modern distributed optimization methods mostly rely on traditional synchronous approaches, despite substantial recent progress in asynchronous optimization. We revisit Synchronous SGD and its robust variant, called $m$-Synchronous SGD, and…
Distributed optimization has attracted lots of attention in the operation of power systems in recent years, where a large area is decomposed into smaller control regions each solving a local optimization problem with periodic information…
One of the most important problems in the field of distributed optimization is the problem of minimizing a sum of local convex objective functions over a networked system. Most of the existing work in this area focus on developing…
Large scale, non-convex optimization problems arising in many complex networks such as the power system call for efficient and scalable distributed optimization algorithms. Existing distributed methods are usually iterative and require…
We introduce novel convergence results for asynchronous iterations that appear in the analysis of parallel and distributed optimization algorithms. The results are simple to apply and give explicit estimates for how the degree of asynchrony…
We show that asymptotically, completely asynchronous stochastic gradient procedures achieve optimal (even to constant factors) convergence rates for the solution of convex optimization problems under nearly the same conditions required for…
Recent years have witnessed the surge of asynchronous parallel (async-parallel) iterative algorithms due to problems involving very large-scale data and a large number of decision variables. Because of asynchrony, the iterates are computed…
System performance for networks composed of interconnected subsystems can be increased if the traditionally separated subsystems are jointly optimized. Recently, parallel and distributed optimization methods have emerged as a powerful tool…
Existing asynchronous distributed optimization algorithms often use diminishing step-sizes that cause slow practical convergence, or use fixed step-sizes that depend on and decrease with an upper bound of the delays. Not only are such delay…
We present a parallelized primal-dual algorithm for solving constrained convex optimization problems. The algorithm is "block-based," in that vectors of primal and dual variables are partitioned into blocks, each of which is updated only by…
In decentralized optimization, nodes of a communication network each possess a local objective function, and communicate using gossip-based methods in order to minimize the average of these per-node functions. While synchronous algorithms…
We describe several features of parallel or distributed asynchronous iterative algorithms such as unbounded delays, possible out of order messages or flexible communication. We concentrate on the concept of macroiteration sequence which was…
We analyze the convergence of gradient-based optimization algorithms that base their updates on delayed stochastic gradient information. The main application of our results is to the development of gradient-based distributed optimization…
We present a parallelized primal-dual algorithm for solving constrained convex optimization problems. The algorithm is "block-based," in that vectors of primal and dual variables are partitioned into blocks, each of which is updated only by…
Emerging workloads, such as graph processing and machine learning are approximate because of the scale of data involved and the stochastic nature of the underlying algorithms. These algorithms are often distributed over multiple machines…
Existing asynchronous distributed optimization algorithms often use diminishing step-sizes that cause slow practical convergence, or fixed step-sizes that depend on an assumed upper bound of delays. Not only is such a delay bound hard to…
The implementation of a vast majority of machine learning (ML) algorithms boils down to solving a numerical optimization problem. In this context, Stochastic Gradient Descent (SGD) methods have long proven to provide good results, both in…
Decentralized optimization enables multiple devices to learn a global machine learning model while each individual device only has access to its local dataset. By avoiding the need for training data to leave individual users' devices, it…