Related papers: First-order Methods Almost Always Avoid Saddle Poi…
In a series of papers \cite{LSJR16, PP17, LPP}, it was established that some of the most commonly used first order methods almost surely (under random initializations) and with step-size being small enough, avoid strict saddle points, as…
The analysis on the global stability of Riemannian gradient descent method in manifold optimization (i.e., it avoids strict saddle points for almost all initializations) due to Lee et al. (Math. Program. 176:311-337) is corrected. Moreover,…
A central challenge to using first-order methods for optimizing nonconvex problems is the presence of saddle points. First-order methods often get stuck at saddle points, greatly deteriorating their performance. Typically, to escape from…
Optimizing non-convex functions is of primary importance in the vast majority of machine learning algorithms. Even though many gradient descent based algorithms have been studied, successive convex approximation based algorithms have been…
Nonconvex optimization algorithms with random initialization have attracted increasing attention recently. It has been showed that many first-order methods always avoid saddle points with random starting points. In this paper, we answer a…
We consider the case of derivative-free algorithms for non-convex optimization, also known as zero order algorithms, that use only function evaluations rather than gradients. For a wide variety of gradient approximators based on finite…
We study the optimization of non-convex functions that are not necessarily smooth (gradient and/or Hessian are Lipschitz) using first order methods. Smoothness is a restrictive assumption in machine learning in both theory and practice,…
Two classes of methods have been proposed for escaping from saddle points with one using the second-order information carried by the Hessian and the other adding the noise into the first-order information. The existing analysis for…
Local search heuristics for non-convex optimizations are popular in applied machine learning. However, in general it is hard to guarantee that such algorithms even converge to a local minimum, due to the existence of complicated saddle…
Large-scale non-convex optimization problems are expensive to solve due to computational and memory costs. To reduce the costs, first-order (computationally efficient) and asynchronous-parallel (memory efficient) algorithms are necessary to…
Saddle points constitute a crucial challenge for first-order gradient descent algorithms. In notions of classical machine learning, they are avoided for example by means of stochastic gradient descent methods. In this work, we provide…
We introduce a class of first-order methods for smooth constrained optimization that are based on an analogy to non-smooth dynamical systems. Two distinctive features of our approach are that (i) projections or optimizations over the entire…
First-order methods for minimization and saddle point (min-max) problems are widely used for solving large-scale problems, in particular arising in machine learning. The majority of works obtain favorable complexity guarantees of such…
Recently, the problem of local minima in very high dimensional non-convex optimization has been challenged and the problem of saddle points has been introduced. This paper introduces a dynamic type of normalization that forces the system to…
Randomly initialized first-order optimization algorithms are the method of choice for solving many high-dimensional nonconvex problems in machine learning, yet general theoretical guarantees cannot rule out convergence to critical points of…
This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i.e., it is almost "dimension-free"). The convergence rate…
The convergence behaviour of first-order methods can be severely slowed down when applied to high-dimensional non-convex functions due to the presence of saddle points. If, additionally, the saddles are surrounded by large plateaus, it is…
This study introduces two second-order methods designed to provably avoid saddle points in composite nonconvex optimization problems: (i) a nonsmooth trust-region method and (ii) a curvilinear linesearch method. These developments are…
Satisfaction of the strict saddle property has become a standard assumption in non-convex optimization, and it ensures that many first-order optimization algorithms will almost always escape saddle points. However, functions exist in…
A central challenge to many fields of science and engineering involves minimizing non-convex error functions over continuous, high dimensional spaces. Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such…