First-order Methods Almost Always Avoid Saddle Points
Machine Learning
2017-10-23 v1 Machine Learning
Optimization and Control
Abstract
We establish that first-order methods avoid saddle points for almost all initializations. Our results apply to a wide variety of first-order methods, including gradient descent, block coordinate descent, mirror descent and variants thereof. The connecting thread is that such algorithms can be studied from a dynamical systems perspective in which appropriate instantiations of the Stable Manifold Theorem allow for a global stability analysis. Thus, neither access to second-order derivative information nor randomness beyond initialization is necessary to provably avoid saddle points.
Cite
@article{arxiv.1710.07406,
title = {First-order Methods Almost Always Avoid Saddle Points},
author = {Jason D. Lee and Ioannis Panageas and Georgios Piliouras and Max Simchowitz and Michael I. Jordan and Benjamin Recht},
journal= {arXiv preprint arXiv:1710.07406},
year = {2017}
}