Related papers: Learning to Initialize Gradient Descent Using Grad…

Accelerated Dual Learning by Homotopic Initialization

Gradient descent and coordinate descent are well understood in terms of their asymptotic behavior, but less so in a transient regime often used for approximations in machine learning. We investigate how proper initialization can have a…

Machine Learning · Computer Science 2017-06-14 Hadi Daneshmand , Hamed Hassani , Thomas Hofmann

Efficient Dictionary Learning with Gradient Descent

Randomly initialized first-order optimization algorithms are the method of choice for solving many high-dimensional nonconvex problems in machine learning, yet general theoretical guarantees cannot rule out convergence to critical points of…

Optimization and Control · Mathematics 2018-09-28 Dar Gilboa , Sam Buchanan , John Wright

Revisiting the Initial Steps in Adaptive Gradient Descent Optimization

Adaptive gradient optimization methods, such as Adam, are prevalent in training deep neural networks across diverse machine learning tasks due to their ability to achieve faster convergence. However, these methods often suffer from…

Machine Learning · Computer Science 2025-02-12 Abulikemu Abuduweili , Changliu Liu

Accelerated Almost-Sure Convergence Rates for Nonconvex Stochastic Gradient Descent using Stochastic Learning Rates

Large-scale optimization problems require algorithms both effective and efficient. One such popular and proven algorithm is Stochastic Gradient Descent which uses first-order gradient information to solve these problems. This paper studies…

Optimization and Control · Mathematics 2021-11-11 Theodoros Mamalis , Dusan Stipanovic , Petros Voulgaris

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution

Recent years have seen a flurry of activities in designing provably efficient nonconvex procedures for solving statistical estimation problems. Due to the highly nonconvex nature of the empirical loss, state-of-the-art procedures often…

Machine Learning · Computer Science 2020-06-09 Cong Ma , Kaizheng Wang , Yuejie Chi , Yuxin Chen

Gradient Descent as Loss Landscape Navigation: a Normative Framework for Deriving Learning Rules

Learning rules -- prescriptions for updating model parameters to improve performance -- are typically assumed rather than derived. Why do some learning rules work better than others, and under what assumptions can a given rule be considered…

Machine Learning · Computer Science 2025-11-03 John J. Vastola , Samuel J. Gershman , Kanaka Rajan

The Alternating Descent Conditional Gradient Method for Sparse Inverse Problems

We propose a variant of the classical conditional gradient method for sparse inverse problems with differentiable measurement models. Such models arise in many practical problems including superresolution, time-series modeling, and matrix…

Optimization and Control · Mathematics 2015-07-07 Nicholas Boyd , Geoffrey Schiebinger , Benjamin Recht

Stopping Rules for Gradient Methods for Non-Convex Problems with Additive Noise in Gradient

We study the gradient method under the assumption that an additively inexact gradient is available for, generally speaking, non-convex problems. The non-convexity of the objective function, as well as the use of an inexactness specified…

Optimization and Control · Mathematics 2022-12-13 Boris T. Polyak , Ilia A. Kuruzov , Fedor S. Stonyakin

Rover Descent: Learning to optimize by learning to navigate on prototypical loss surfaces

Learning to optimize - the idea that we can learn from data algorithms that optimize a numerical criterion - has recently been at the heart of a growing number of research efforts. One of the most challenging issues within this approach is…

Machine Learning · Computer Science 2018-02-21 Louis Faury , Flavian Vasile

Adaptive Gradient Descent for Convex and Non-Convex Stochastic Optimization

In this paper we propose several adaptive gradient methods for stochastic optimization. Unlike AdaGrad-type of methods, our algorithms are based on Armijo-type line search and they simultaneously adapt to the unknown Lipschitz constant of…

Optimization and Control · Mathematics 2020-06-15 Darina Dvinskikh , Aleksandr Ogaltsov , Alexander Gasnikov , Pavel Dvurechensky , Alexander Tyurin , Vladimir Spokoiny

Machine learning approach to chance-constrained problems: An algorithm based on the stochastic gradient descent

We consider chance-constrained problems with discrete random distribution. We aim for problems with a large number of scenarios. We propose a novel method based on the stochastic gradient descent method which performs updates of the…

Optimization and Control · Mathematics 2019-05-28 Lukáš Adam , Martin Branda

Distributed Learning in Non-Convex Environments -- Part I: Agreement at a Linear Rate

Driven by the need to solve increasingly complex optimization problems in signal processing and machine learning, there has been increasing interest in understanding the behavior of gradient-descent algorithms in non-convex environments.…

Optimization and Control · Mathematics 2019-07-04 Stefan Vlaski , Ali H. Sayed

Gradient descent revisited via an adaptive online learning rate

Any gradient descent optimization requires to choose a learning rate. With deeper and deeper models, tuning that learning rate can easily become tedious and does not necessarily lead to an ideal convergence. We propose a variation of the…

Machine Learning · Statistics 2018-04-10 Mathieu Ravaut , Satya Gorti

Unsupervised Learning of Initialization in Deep Neural Networks via Maximum Mean Discrepancy

Despite the recent success of stochastic gradient descent in deep learning, it is often difficult to train a deep neural network with an inappropriate choice of its initial parameters. Even if training is successful, it has been known that…

Machine Learning · Computer Science 2023-02-10 Cheolhyoung Lee , Kyunghyun Cho

Small random initialization is akin to spectral learning: Optimization and generalization guarantees for overparameterized low-rank matrix reconstruction

Recently there has been significant theoretical progress on understanding the convergence and generalization of gradient-based methods on nonconvex losses with overparameterized models. Nevertheless, many aspects of optimization and…

Machine Learning · Computer Science 2022-09-27 Dominik Stöger , Mahdi Soltanolkotabi

Accelerating Ill-Conditioned Low-Rank Matrix Estimation via Scaled Gradient Descent

Low-rank matrix estimation is a canonical problem that finds numerous applications in signal processing, machine learning and imaging science. A popular approach in practice is to factorize the matrix into two compact low-rank factors, and…

Machine Learning · Computer Science 2021-06-16 Tian Tong , Cong Ma , Yuejie Chi

Descent-to-Delete: Gradient-Based Methods for Machine Unlearning

We study the data deletion problem for convex models. By leveraging techniques from convex optimization and reservoir sampling, we give the first data deletion algorithms that are able to handle an arbitrarily long sequence of adversarial…

Machine Learning · Statistics 2020-07-07 Seth Neel , Aaron Roth , Saeed Sharifi-Malvajerdi

Gradient Descent Provably Optimizes Over-parameterized Neural Networks

One of the mysteries in the success of neural networks is randomly initialized first order methods like gradient descent can achieve zero training loss even though the objective function is non-convex and non-smooth. This paper demystifies…

Machine Learning · Computer Science 2019-02-06 Simon S. Du , Xiyu Zhai , Barnabas Poczos , Aarti Singh

Iterative Implicit Gradients for Nonconvex Optimization with Variational Inequality Constraints

We propose an optimization proxy in terms of iterative implicit gradient methods for solving constrained optimization problems with nonconvex loss functions. This framework can be applied to a broad range of machine learning settings,…

Optimization and Control · Mathematics 2025-10-14 Harshal D. Kaushik , Ming Jin

On the Initialization for Convex-Concave Min-max Problems

Convex-concave min-max problems are ubiquitous in machine learning, and people usually utilize first-order methods (e.g., gradient descent ascent) to find the optimal solution. One feature which separates convex-concave min-max problems…

Optimization and Control · Mathematics 2022-03-09 Mingrui Liu , Francesco Orabona