Related papers: Quantization Avoids Saddle Points in Distributed O…
In this paper, we study unconstrained distributed optimization strongly convex problems, in which the exchange of information in the network is captured by a directed graph topology over digital channels that have limited capacity (and…
The problem of saddle-point avoidance for non-convex optimization is quite challenging in large scale distributed learning frameworks, such as Federated Learning, especially in the presence of Byzantine workers. The celebrated…
This paper focuses on the distributed optimization of stochastic saddle point problems. The first part of the paper is devoted to lower bounds for the centralized and decentralized distributed methods for smooth (strongly) convex-(strongly)…
Privacy protection and nonconvexity are two challenging problems in decentralized optimization and learning involving sensitive data. Despite some recent advances addressing each of the two problems separately, no results have been reported…
Non-convex optimization is a critical tool in advancing machine learning, especially for complex models like deep neural networks and support vector machines. Despite challenges such as multiple local minima and saddle points, non-convex…
We propose a regularized saddle-point algorithm for convex networked optimization problems with resource allocation constraints. Standard distributed gradient methods suffer from slow convergence and require excessive communication when…
We study distributed optimization problems over a network when the communication between the nodes is constrained, and so information that is exchanged between the nodes must be quantized. This imperfect communication poses a fundamental…
Recently, the problem of local minima in very high dimensional non-convex optimization has been challenged and the problem of saddle points has been introduced. This paper introduces a dynamic type of normalization that forces the system to…
We study non-smooth stochastic decentralized optimization problems over time-varying networks, where objective functions are distributed across nodes and network connections may intermittently appear or break. Specifically, we consider two…
A central challenge to many fields of science and engineering involves minimizing non-convex error functions over continuous, high dimensional spaces. Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such…
By enabling multiple agents to cooperatively solve a global optimization problem in the absence of a central coordinator, decentralized stochastic optimization is gaining increasing attention in areas as diverse as machine learning,…
A central challenge to many fields of science and engineering involves minimizing non-convex error functions over continuous, high dimensional spaces. Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such…
Nonconvex optimization problems such as the ones in training deep neural networks suffer from a phenomenon called saddle point proliferation. This means that there are a vast number of high error saddle points present in the loss function.…
The diffusion strategy for distributed learning from streaming data employs local stochastic gradient updates along with exchange of iterates over neighborhoods. In Part I [2] of this work we established that agents cluster around a network…
In centralized settings, it is well known that stochastic gradient descent (SGD) avoids saddle points and converges to local minima in nonconvex problems. However, similar guarantees are lacking for distributed first-order algorithms. The…
We consider distributed convex-concave saddle point problems over arbitrary connected undirected networks and propose a decentralized distributed algorithm for their solution. The local functions distributed across the nodes are assumed to…
We study optimization algorithms for the finite sum problems frequently arising in machine learning applications. First, we propose novel variants of stochastic gradient descent with a variance reduction property that enables linear…
Gradient descent (GD) and stochastic gradient descent (SGD) are the workhorses of large-scale machine learning. While classical theory focused on analyzing the performance of these methods in convex optimization problems, the most notable…
Optimizing non-convex functions is of primary importance in the vast majority of machine learning algorithms. Even though many gradient descent based algorithms have been studied, successive convex approximation based algorithms have been…
Saddle-point or primal-dual methods have recently attracted renewed interest as a systematic technique to design distributed algorithms which solve convex optimization problems. When implemented online for streaming data or as dynamic…