Related papers: Escaping From Saddle Points Using Asynchronous Coo…
Optimizing non-convex functions is of primary importance in the vast majority of machine learning algorithms. Even though many gradient descent based algorithms have been studied, successive convex approximation based algorithms have been…
Asynchronous algorithms have attracted much attention recently due to the crucial demands on solving large-scale optimization problems. However, the accelerated versions of asynchronous algorithms are rarely studied. In this paper, we…
A central challenge to many fields of science and engineering involves minimizing non-convex error functions over continuous, high dimensional spaces. Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such…
A central challenge to many fields of science and engineering involves minimizing non-convex error functions over continuous, high dimensional spaces. Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such…
Rapid advances in data collection and processing capabilities have allowed for the use of increasingly complex models that give rise to nonconvex optimization problems. These formulations, however, can be arbitrarily difficult to solve in…
Stochastically controlled stochastic gradient (SCSG) methods have been proved to converge efficiently to first-order stationary points which, however, can be saddle points in nonconvex optimization. It has been observed that a stochastic…
Local search heuristics for non-convex optimizations are popular in applied machine learning. However, in general it is hard to guarantee that such algorithms even converge to a local minimum, due to the existence of complicated saddle…
In this paper, we propose and analyze zeroth-order stochastic approximation algorithms for nonconvex and convex optimization, with a focus on addressing constrained optimization, high-dimensional setting and saddle-point avoiding. To handle…
Gradient-related first-order methods have become the workhorse of large-scale numerical optimization problems. Many of these problems involve nonconvex objective functions with multiple saddle points, which necessitates an understanding of…
Randomly initialized first-order optimization algorithms are the method of choice for solving many high-dimensional nonconvex problems in machine learning, yet general theoretical guarantees cannot rule out convergence to critical points of…
A central challenge to using first-order methods for optimizing nonconvex problems is the presence of saddle points. First-order methods often get stuck at saddle points, greatly deteriorating their performance. Typically, to escape from…
The convergence behaviour of first-order methods can be severely slowed down when applied to high-dimensional non-convex functions due to the presence of saddle points. If, additionally, the saddles are surrounded by large plateaus, it is…
We study the problem of minimizing the sum of potentially non-differentiable convex cost functions with partially overlapping dependences in an asynchronous manner, where communication in the network is not coordinated. We study the…
Two classes of methods have been proposed for escaping from saddle points with one using the second-order information carried by the Hessian and the other adding the noise into the first-order information. The existing analysis for…
We propose stochastic optimization algorithms that can find local minima faster than existing algorithms for nonconvex optimization problems, by exploiting the third-order smoothness to escape non-degenerate saddle points more efficiently.…
In machine learning, asynchronous parallel stochastic gradient descent (APSGD) is broadly used to speed up the training process through multi-workers. Meanwhile, the time delay of stale gradients in asynchronous algorithms is generally…
Nesterov's accelerated gradient descent (AGD), an instance of the general family of "momentum methods", provably achieves faster convergence rate than gradient descent (GD) in the convex setting. However, whether these methods are superior…
We consider the problem of finding local minimizers in non-convex and non-smooth optimization. Under the assumption of strict saddle points, positive results have been derived for first-order methods. We present the first known results for…
In centralized settings, it is well known that stochastic gradient descent (SGD) avoids saddle points and converges to local minima in nonconvex problems. However, similar guarantees are lacking for distributed first-order algorithms. The…
One of the most attractive recent approaches to processing well-structured large-scale convex optimization problems is based on smooth convex-concave saddle point reformu-lation of the problem of interest and solving the resulting problem…