Related papers: Adaptive Conditional Gradient Descent

Stochastic Adaptive Gradient Descent Without Descent

We introduce a new adaptive step-size strategy for convex optimization with stochastic gradient that exploits the local geometry of the objective function only by means of a first-order stochastic oracle and without any hyper-parameter…

Machine Learning · Computer Science 2025-09-19 Jean-François Aujol , Jérémie Bigot , Camille Castera

On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes

Stochastic gradient descent is the method of choice for large scale optimization of machine learning objective functions. Yet, its performance is greatly variable and heavily depends on the choice of the stepsizes. This has motivated a…

Machine Learning · Statistics 2019-02-28 Xiaoyu Li , Francesco Orabona

Adaptive Step Sizes for Preconditioned Stochastic Gradient Descent

This paper proposes a novel approach to adaptive step sizes in stochastic gradient descent (SGD) by utilizing quantities that we have identified as numerically traceable -- the Lipschitz constant for gradients and a concept of the local…

Optimization and Control · Mathematics 2024-09-19 Frederik Köhne , Leonie Kreis , Anton Schiela , Roland Herzog

Adaptive and Optimal Second-order Optimistic Methods for Minimax Optimization

We propose adaptive, line search-free second-order methods with optimal rate of convergence for solving convex-concave min-max problems. By means of an adaptive step size, our algorithms feature a simple update rule that requires solving…

Optimization and Control · Mathematics 2024-11-12 Ruichen Jiang , Ali Kavis , Qiujiang Jin , Sujay Sanghavi , Aryan Mokhtari

Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression

We study gradient descent (GD) with a constant stepsize for $\ell_2$-regularized logistic regression with linearly separable data. Classical theory suggests small stepsizes to ensure monotonic reduction of the optimization objective,…

Machine Learning · Statistics 2025-11-04 Jingfeng Wu , Pierre Marion , Peter Bartlett

Adaptive multi-gradient methods for quasiconvex vector optimization and applications to multi-task learning

We present an adaptive step-size method, which does not include line-search techniques, for solving a wide class of nonconvex multiobjective programming problems on an unbounded constraint set. We also prove convergence of a general…

Optimization and Control · Mathematics 2024-02-12 Nguyen Anh Minh , Le Dung Muu , Tran Ngoc Thang

Adaptive Preconditioned Gradient Descent with Energy

We propose an adaptive step size with an energy approach for a suitable class of preconditioned gradient descent methods. We focus on settings where the preconditioning is applied to address the constraints in optimization problems, such as…

Optimization and Control · Mathematics 2024-06-17 Hailiang Liu , Levon Nurbekyan , Xuping Tian , Yunan Yang

Adaptive Gradient Descent for Convex and Non-Convex Stochastic Optimization

In this paper we propose several adaptive gradient methods for stochastic optimization. Unlike AdaGrad-type of methods, our algorithms are based on Armijo-type line search and they simultaneously adapt to the unknown Lipschitz constant of…

Optimization and Control · Mathematics 2020-06-15 Darina Dvinskikh , Aleksandr Ogaltsov , Alexander Gasnikov , Pavel Dvurechensky , Alexander Tyurin , Vladimir Spokoiny

Self-adaptive algorithms for quasiconvex programming and applications to machine learning

For solving a broad class of nonconvex programming problems on an unbounded constraint set, we provide a self-adaptive step-size strategy that does not include line-search techniques and establishes the convergence of a generic approach…

Optimization and Control · Mathematics 2022-12-14 Thang Tran Ngoc , Hai Trinh Ngoc

SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance

We study Stochastic Gradient Descent with AdaGrad stepsizes: a popular adaptive (self-tuning) method for first-order stochastic optimization. Despite being well studied, existing analyses of this method suffer from various shortcomings:…

Machine Learning · Computer Science 2023-06-13 Amit Attia , Tomer Koren

A decentralized proximal-gradient method with network independent step-sizes and separated convergence rates

This paper proposes a novel proximal-gradient algorithm for a decentralized optimization problem with a composite objective containing smooth and non-smooth terms. Specifically, the smooth and nonsmooth terms are dealt with by gradient and…

Optimization and Control · Mathematics 2021-02-02 Zhi Li , Wei Shi , Ming Yan

Universality of AdaGrad Stepsizes for Stochastic Optimization: Inexact Oracle, Acceleration and Variance Reduction

We present adaptive gradient methods (both basic and accelerated) for solving convex composite optimization problems in which the main part is approximately smooth (a.k.a. $(\delta, L)$-smooth) and can be accessed only via a (potentially…

Optimization and Control · Mathematics 2024-06-11 Anton Rodomanov , Xiaowen Jiang , Sebastian Stich

An adaptive framework for first-order gradient methods

Gradient methods are widely used in optimization problems. In practice, while the smoothness parameter can be estimated utilizing techniques such as backtracking, estimating the strong convexity parameter remains a challenge; moreover, even…

Optimization and Control · Mathematics 2026-02-17 Xiaozhe Hu , Sara Pollock , Zhongqin Xue , Yunrong Zhu

Local Quadratic Convergence of Stochastic Gradient Descent with Adaptive Step Size

Establishing a fast rate of convergence for optimization methods is crucial to their applicability in practice. With the increasing popularity of deep learning over the past decade, stochastic gradient descent and its adaptive variants…

Optimization and Control · Mathematics 2022-01-03 Adityanarayanan Radhakrishnan , Mikhail Belkin , Caroline Uhler

On the Convergence of Step Decay Step-Size for Stochastic Optimization

The convergence of stochastic gradient descent is highly dependent on the step-size, especially on non-convex problems such as neural network training. Step decay step-size schedules (constant and then cut) are widely used in practice…

Optimization and Control · Mathematics 2021-02-19 Xiaoyu Wang , Sindri Magnússon , Mikael Johansson

First-ish Order Methods: Hessian-aware Scalings of Gradient Descent

Gradient descent is the primary workhorse for optimizing large-scale problems in machine learning. However, its performance is highly sensitive to the choice of the learning rate. A key limitation of gradient descent is its lack of natural…

Optimization and Control · Mathematics 2025-07-15 Oscar Smee , Fred Roosta , Stephen J. Wright

Accelerated Gradient Algorithms with Adaptive Subspace Search for Instance-Faster Optimization

Gradient-based minimax optimal algorithms have greatly promoted the development of continuous optimization and machine learning. One seminal work due to Yurii Nesterov [Nes83a] established $\tilde{\mathcal{O}}(\sqrt{L/\mu})$ gradient…

Machine Learning · Computer Science 2023-12-07 Yuanshi Liu , Hanzhen Zhao , Yang Xu , Pengyun Yue , Cong Fang

Second-order Properties of Noisy Distributed Gradient Descent

We study a fixed step-size noisy distributed gradient descent algorithm for solving optimization problems in which the objective is a finite sum of smooth but possibly non-convex functions. Random perturbations are introduced to the…

Optimization and Control · Mathematics 2023-07-21 Lei Qin , Michael Cantoni , Ye Pu

Adaptive Step-Size Methods for Compressed SGD

Compressed Stochastic Gradient Descent (SGD) algorithms have been recently proposed to address the communication bottleneck in distributed and decentralized optimization problems, such as those that arise in federated machine learning.…

Machine Learning · Statistics 2022-07-21 Adarsh M. Subramaniam , Akshayaa Magesh , Venugopal V. Veeravalli

Gradient Descent with Provably Tuned Learning-rate Schedules

Gradient-based iterative optimization methods are the workhorse of modern machine learning. They crucially rely on careful tuning of parameters like learning rate and momentum. However, one typically sets them using heuristic approaches…

Machine Learning · Computer Science 2025-12-05 Dravyansh Sharma