Related papers: Efficient Dictionary Learning with Gradient Descen…

Escaping Saddle Points for Zeroth-order Nonconvex Optimization using Estimated Gradient Descent

Gradient descent and its variants are widely used in machine learning. However, oracle access of gradient may not be available in many applications, limiting the direct use of gradient descent. This paper proposes a method of estimating…

Optimization and Control · Mathematics 2019-10-07 Qinbo Bai , Mridul Agarwal , Vaneet Aggarwal

Escaping From Saddle Points --- Online Stochastic Gradient for Tensor Decomposition

We analyze stochastic gradient descent for optimizing non-convex functions. In many cases for non-convex functions the goal is to find a reasonable local minimum, and the main concern is that gradient updates are trapped in saddle points.…

Machine Learning · Computer Science 2015-03-10 Rong Ge , Furong Huang , Chi Jin , Yang Yuan

On the global convergence of randomized coordinate gradient descent for non-convex optimization

In this work, we analyze the global convergence property of coordinate gradient descent with random choice of coordinates and stepsizes for non-convex optimization problems. Under generic assumptions, we prove that the algorithm iterate…

Optimization and Control · Mathematics 2022-12-01 Ziang Chen , Yingzhou Li , Jianfeng Lu

Second-Order Guarantees in Centralized, Federated and Decentralized Nonconvex Optimization

Rapid advances in data collection and processing capabilities have allowed for the use of increasingly complex models that give rise to nonconvex optimization problems. These formulations, however, can be arbitrarily difficult to solve in…

Multiagent Systems · Computer Science 2020-04-01 Stefan Vlaski , Ali H. Sayed

Inexact First-Order Primal-Dual Algorithms

In this paper we investigate the convergence of a recently popular class of first-order primal-dual algorithms for saddle point problems under the presence of errors occurring in the proximal maps and gradients. We study several types of…

Optimization and Control · Mathematics 2020-02-26 Julian Rasch , Antonin Chambolle

Escaping Saddle Points with the Successive Convex Approximation Algorithm

Optimizing non-convex functions is of primary importance in the vast majority of machine learning algorithms. Even though many gradient descent based algorithms have been studied, successive convex approximation based algorithms have been…

Optimization and Control · Mathematics 2019-03-06 Amrit Singh Bedi , Ketan Rajawat , Vaneet Aggarwal

Gradient Descent Provably Optimizes Over-parameterized Neural Networks

One of the mysteries in the success of neural networks is randomly initialized first order methods like gradient descent can achieve zero training loss even though the objective function is non-convex and non-smooth. This paper demystifies…

Machine Learning · Computer Science 2019-02-06 Simon S. Du , Xiyu Zhai , Barnabas Poczos , Aarti Singh

Accelerated Almost-Sure Convergence Rates for Nonconvex Stochastic Gradient Descent using Stochastic Learning Rates

Large-scale optimization problems require algorithms both effective and efficient. One such popular and proven algorithm is Stochastic Gradient Descent which uses first-order gradient information to solve these problems. This paper studies…

Optimization and Control · Mathematics 2021-11-11 Theodoros Mamalis , Dusan Stipanovic , Petros Voulgaris

Dealing with unbounded gradients in stochastic saddle-point optimization

We study the performance of stochastic first-order methods for finding saddle points of convex-concave functions. A notorious challenge faced by such methods is that the gradients can grow arbitrarily large during optimization, which may…

Machine Learning · Computer Science 2024-06-10 Gergely Neu , Nneka Okolo

Learning to Initialize Gradient Descent Using Gradient Descent

Non-convex optimization problems are challenging to solve; the success and computational expense of a gradient descent algorithm or variant depend heavily on the initialization strategy. Often, either random initialization is used or…

Machine Learning · Computer Science 2020-12-23 Kartik Ahuja , Amit Dhurandhar , Kush R. Varshney

Gradient Descent Is Optimal Under Lower Restricted Secant Inequality And Upper Error Bound

The study of first-order optimization is sensitive to the assumptions made on the objective functions. These assumptions induce complexity classes which play a key role in worst-case analysis, including the fundamental concept of algorithm…

Optimization and Control · Mathematics 2024-05-30 Charles Guille-Escuret , Adam Ibrahim , Baptiste Goujaud , Ioannis Mitliagkas

Variance Reduction for Faster Non-Convex Optimization

We consider the fundamental problem in non-convex optimization of efficiently reaching a stationary point. In contrast to the convex case, in the long history of this basic problem, the only known theoretical results on first-order…

Optimization and Control · Mathematics 2016-08-26 Zeyuan Allen-Zhu , Elad Hazan

On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points

Gradient descent (GD) and stochastic gradient descent (SGD) are the workhorses of large-scale machine learning. While classical theory focused on analyzing the performance of these methods in convex optimization problems, the most notable…

Machine Learning · Computer Science 2019-09-05 Chi Jin , Praneeth Netrapalli , Rong Ge , Sham M. Kakade , Michael I. Jordan

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

A central challenge to many fields of science and engineering involves minimizing non-convex error functions over continuous, high dimensional spaces. Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such…

Machine Learning · Computer Science 2014-06-11 Yann Dauphin , Razvan Pascanu , Caglar Gulcehre , Kyunghyun Cho , Surya Ganguli , Yoshua Bengio

On Graduated Optimization for Stochastic Non-Convex Problems

The graduated optimization approach, also known as the continuation method, is a popular heuristic to solving non-convex problems that has received renewed interest over the last decade. Despite its popularity, very little is known in terms…

Machine Learning · Computer Science 2015-07-28 Elad Hazan , Kfir Y. Levy , Shai Shalev-Shwartz

Why Learning of Large-Scale Neural Networks Behaves Like Convex Optimization

In this paper, we present some theoretical work to explain why simple gradient descent methods are so successful in solving non-convex optimization problems in learning large-scale neural networks (NN). After introducing a mathematical tool…

Machine Learning · Computer Science 2023-05-01 Hui Jiang

On the saddle point problem for non-convex optimization

A central challenge to many fields of science and engineering involves minimizing non-convex error functions over continuous, high dimensional spaces. Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such…

Machine Learning · Computer Science 2014-05-29 Razvan Pascanu , Yann N. Dauphin , Surya Ganguli , Yoshua Bengio

Convergence rate analysis of the gradient descent-ascent method for convex-concave saddle-point problems

In this paper, we study the gradient descent-ascent method for convex-concave saddle-point problems. We derive a new non-asymptotic global convergence rate in terms of distance to the solution set by using the semidefinite programming…

Optimization and Control · Mathematics 2022-09-19 Moslem Zamani , Hadi Abbaszadehpeivasti , Etienne de Klerk

On the Convergence of (Stochastic) Gradient Descent with Extrapolation for Non-Convex Optimization

Extrapolation is a well-known technique for solving convex optimization and variational inequalities and recently attracts some attention for non-convex optimization. Several recent works have empirically shown its success in some machine…

Optimization and Control · Mathematics 2019-02-06 Yi Xu , Zhuoning Yuan , Sen Yang , Rong Jin , Tianbao Yang

Subgradient Descent Learns Orthogonal Dictionaries

This paper concerns dictionary learning, i.e., sparse coding, a fundamental representation learning problem. We show that a subgradient descent algorithm, with random initialization, can provably recover orthogonal dictionaries on a natural…

Machine Learning · Computer Science 2019-07-02 Yu Bai , Qijia Jiang , Ju Sun