Related papers: A Primal-Dual SGD Algorithm for Distributed Noncon…
This paper investigates accelerating the convergence of distributed optimization algorithms on non-convex problems. We propose a distributed primal-dual stochastic gradient descent~(SGD) equipped with "powerball" method to accelerate. We…
This paper studies distributed nonconvex optimization problems with stochastic gradients for a multi-agent system, in which each agent aims to minimize the sum of all agents' cost functions by using local compressed information exchange. We…
This paper considers the distributed nonconvex optimization problem of minimizing a global cost function formed by a sum of local cost functions by using local information exchange. We first consider a distributed first-order primal-dual…
Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions…
This paper considers distributed optimization for minimizing the average of local nonconvex cost functions, by using local information exchange over undirected communication networks. To reduce the required communication capacity, we…
This paper investigates the stochastic distributed nonconvex optimization problem of minimizing a global cost function formed by the summation of $n$ local cost functions. We solve such a problem by involving zeroth-order (ZO) information…
In this paper we consider distributed optimization problems in which the cost function is separable, i.e., a sum of possibly non-smooth functions all sharing a common variable, and can be split into a strongly convex term and a convex one.…
Large-scale nonconvex optimization problems are ubiquitous in modern machine learning, and among practitioners interested in solving them, Stochastic Gradient Descent (SGD) reigns supreme. We revisit the analysis of SGD in the nonconvex…
This technical note studies a class of distributed nonsmooth convex consensus optimization problem. The cost function is a summation of local cost functions which are convex but nonsmooth. Each of the local cost functions consists of a…
The article discusses distributed gradient-descent algorithms for computing local and global minima in nonconvex optimization. For local optimization, we focus on distributed stochastic gradient descent (D-SGD)--a simple network-based…
In this paper, we consider a stochastic distributed nonconvex optimization problem with the cost function being distributed over $n$ agents having access only to zeroth-order (ZO) information of the cost. This problem has various machine…
Local SGD is a promising approach to overcome the communication overhead in distributed learning by reducing the synchronization frequency among worker nodes. Despite the recent theoretical advances of local SGD in empirical risk…
In this paper, we develop a distributed algorithm for solving a class of distributed convex optimization problems where the local objective functions can be a general nonsmooth function, and all equalities and inequalities are network-wide…
The primal-dual distributed optimization methods have broad large-scale machine learning applications. Previous primal-dual distributed methods are not applicable when the dual formulation is not available, e.g. the sum-of-non-convex…
This paper studies distributed stochastic nonconvex optimization problems with compressed communication and differential privacy, in which each agent aims to minimize the sum of all agents' cost functions by using local compressed…
We study diffusion and consensus based optimization of a sum of unknown convex objective functions over distributed networks. The only access to these functions is through stochastic gradient oracles, each of which is only available at a…
Stochastic gradient descent (SGD) is one of the most widely used optimization methods for parallel and distributed processing of large datasets. One of the key limitations of distributed SGD is the need to regularly communicate the…
In this paper, we propose a method of distributed stochastic gradient descent (SGD), with low communication load and computational complexity, and still fast convergence. To reduce the communication load, at each iteration of the algorithm,…
Non-convex optimization problems are ubiquitous in machine learning, especially in Deep Learning. While such complex problems can often be successfully optimized in practice by using stochastic gradient descent (SGD), theoretical analysis…
Communication overhead is one of the key challenges that hinders the scalability of distributed optimization algorithms. In this paper, we study local distributed SGD, where data is partitioned among computation nodes, and the computation…