Related papers: Gradient Descent Methods for Regularized Optimizat…

Gradient Descent in the Absence of Global Lipschitz Continuity of the Gradients

Gradient descent (GD) is a collection of continuous optimization methods that have achieved immeasurable success in practice. Owing to data science applications, GD with diminishing step sizes has become a prominent variant. While this…

Optimization and Control · Mathematics 2023-06-27 Vivak Patel , Albert S. Berahas

Generalization to the Natural Gradient Descent

Optimization problem, which is aimed at finding the global minimal value of a given cost function, is one of the central problem in science and engineering. Various numerical methods have been proposed to solve this problem, among which the…

Optimization and Control · Mathematics 2022-10-07 Shaojun Dong , Fengyu Le , Meng Zhang , Si-Jing Tao , Chao Wang , Yong-Jian Han , Guo-Ping Guo

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(\log(T)/T), by running SGD for…

Machine Learning · Computer Science 2015-03-19 Alexander Rakhlin , Ohad Shamir , Karthik Sridharan

Stochastic Gradient Descent in the Viewpoint of Graduated Optimization

Stochastic gradient descent (SGD) method is popular for solving non-convex optimization problems in machine learning. This work investigates SGD from a viewpoint of graduated optimization, which is a widely applied approach for non-convex…

Optimization and Control · Mathematics 2023-08-15 Da Li , Jingjing Wu , Qingrun Zhang

Beyond the Edge of Stability via Two-step Gradient Updates

Gradient Descent (GD) is a powerful workhorse of modern machine learning thanks to its scalability and efficiency in high-dimensional spaces. Its ability to find local minimisers is only guaranteed for losses with Lipschitz gradients, where…

Machine Learning · Computer Science 2023-07-27 Lei Chen , Joan Bruna

A New Perspective of Accelerated Gradient Methods: The Controlled Invariant Manifold Approach

Gradient Descent (GD) is a ubiquitous algorithm for finding the optimal solution to an optimization problem. For reduced computational complexity, the optimal solution $\mathrm{x^*}$ of the optimization problem must be attained in a minimum…

Optimization and Control · Mathematics 2023-06-01 Revati Gunjal , Sushama Wagh , Syed Shadab Nayyer , Alex Stankovic , Navdeep M. Singh

A Unified Approach to Controlling Implicit Regularization via Mirror Descent

Inspired by the remarkable success of large neural networks, there has been significant interest in understanding the generalization performance of over-parameterized models. Substantial efforts have been invested in characterizing how…

Machine Learning · Computer Science 2024-01-12 Haoyuan Sun , Khashayar Gatmiry , Kwangjun Ahn , Navid Azizan

An Exponentially Increasing Step-size for Parameter Estimation in Statistical Models

Using gradient descent (GD) with fixed or decaying step-size is a standard practice in unconstrained optimization problems. However, when the loss function is only locally convex, such a step-size schedule artificially slows GD down as it…

Machine Learning · Statistics 2023-02-03 Nhat Ho , Tongzheng Ren , Sujay Sanghavi , Purnamrita Sarkar , Rachel Ward

Stochastic Optimization of Large-Scale Parametrized Dynamical Systems

Many relevant problems in the area of systems and control, such as controller synthesis, observer design and model reduction, can be viewed as optimization problems involving dynamical systems: for instance, maximizing performance in the…

Optimization and Control · Mathematics 2023-11-15 Pascal Den Boef , Jos Maubach , Wil Schilders , Nathan van de Wouw

Beyond Convexity: Stochastic Quasi-Convex Optimization

Stochastic convex optimization is a basic and well studied primitive in machine learning. It is well known that convex and Lipschitz functions can be minimized efficiently using Stochastic Gradient Descent (SGD). The Normalized Gradient…

Machine Learning · Computer Science 2015-10-29 Elad Hazan , Kfir Y. Levy , Shai Shalev-Shwartz

Obtaining Adjustable Regularization for Free via Iterate Averaging

Regularization for optimization is a crucial technique to avoid overfitting in machine learning. In order to obtain the best performance, we usually train a model by tuning the regularization parameters. It becomes costly, however, when a…

Machine Learning · Computer Science 2020-08-18 Jingfeng Wu , Vladimir Braverman , Lin F. Yang

Adaptive Proximal Gradient Method for Convex Optimization

In this paper, we explore two fundamental first-order algorithms in convex optimization, namely, gradient descent (GD) and proximal gradient method (ProxGD). Our focus is on making these algorithms entirely adaptive by leveraging local…

Optimization and Control · Mathematics 2024-02-13 Yura Malitsky , Konstantin Mishchenko

AutoGD: Automatic Learning Rate Selection for Gradient Descent

The performance of gradient-based optimization methods, such as standard gradient descent (GD), greatly depends on the choice of learning rate. However, it can require a non-trivial amount of user tuning effort to select an appropriate…

Machine Learning · Computer Science 2025-10-14 Nikola Surjanovic , Alexandre Bouchard-Côté , Trevor Campbell

Implicit vs. explicit regularization for high-dimensional gradient descent

In this paper we investigate the generalization error of gradient descent (GD) applied to an $\ell_2$-regularized OLS objective function in the linear model. Based on our analysis we develop new methodology for computationally tractable and…

Statistics Theory · Mathematics 2026-01-27 Thomas Stark , Lukas Steinberger

Algorithmic Regularization in Tensor Optimization: Towards a Lifted Approach in Matrix Sensing

Gradient descent (GD) is crucial for generalization in machine learning models, as it induces implicit regularization, promoting compact representations. In this work, we examine the role of GD in inducing implicit regularization for tensor…

Optimization and Control · Mathematics 2023-10-25 Ziye Ma , Javad Lavaei , Somayeh Sojoudi

Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression

We study gradient descent (GD) with a constant stepsize for $\ell_2$-regularized logistic regression with linearly separable data. Classical theory suggests small stepsizes to ensure monotonic reduction of the optimization objective,…

Machine Learning · Statistics 2025-11-04 Jingfeng Wu , Pierre Marion , Peter Bartlett

Thinking Outside the Ball: Optimal Learning with Gradient Descent for Generalized Linear Stochastic Convex Optimization

We consider linear prediction with a convex Lipschitz loss, or more generally, stochastic convex optimization problems of generalized linear form, i.e.~where each instantaneous loss is a scalar convex function of a linear function. We show…

Machine Learning · Computer Science 2022-11-01 Idan Amir , Roi Livni , Nathan Srebro

Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes

Stochastic Gradient Descent (SGD) is one of the simplest and most popular stochastic optimization methods. While it has already been theoretically studied for decades, the classical analysis usually required non-trivial smoothness…

Machine Learning · Computer Science 2013-01-01 Ohad Shamir , Tong Zhang

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution

Recent years have seen a flurry of activities in designing provably efficient nonconvex procedures for solving statistical estimation problems. Due to the highly nonconvex nature of the empirical loss, state-of-the-art procedures often…

Machine Learning · Computer Science 2020-06-09 Cong Ma , Kaizheng Wang , Yuejie Chi , Yuxin Chen

Stochastic Proximal Gradient Descent for Nuclear Norm Regularization

In this paper, we utilize stochastic optimization to reduce the space complexity of convex composite optimization with a nuclear norm regularizer, where the variable is a matrix of size $m \times n$. By constructing a low-rank estimate of…

Machine Learning · Computer Science 2015-12-08 Lijun Zhang , Tianbao Yang , Rong Jin , Zhi-Hua Zhou