English

First-order Methods Almost Always Avoid Saddle Points

Machine Learning 2017-10-23 v1 Machine Learning Optimization and Control

Abstract

We establish that first-order methods avoid saddle points for almost all initializations. Our results apply to a wide variety of first-order methods, including gradient descent, block coordinate descent, mirror descent and variants thereof. The connecting thread is that such algorithms can be studied from a dynamical systems perspective in which appropriate instantiations of the Stable Manifold Theorem allow for a global stability analysis. Thus, neither access to second-order derivative information nor randomness beyond initialization is necessary to provably avoid saddle points.

Cite

@article{arxiv.1710.07406,
  title  = {First-order Methods Almost Always Avoid Saddle Points},
  author = {Jason D. Lee and Ioannis Panageas and Georgios Piliouras and Max Simchowitz and Michael I. Jordan and Benjamin Recht},
  journal= {arXiv preprint arXiv:1710.07406},
  year   = {2017}
}