Related papers: Doubly Adaptive Scaled Algorithm for Machine Learn…

Adaptive and Optimal Second-order Optimistic Methods for Minimax Optimization

We propose adaptive, line search-free second-order methods with optimal rate of convergence for solving convex-concave min-max problems. By means of an adaptive step size, our algorithms feature a simple update rule that requires solving…

Optimization and Control · Mathematics 2024-11-12 Ruichen Jiang , Ali Kavis , Qiujiang Jin , Sujay Sanghavi , Aryan Mokhtari

First-ish Order Methods: Hessian-aware Scalings of Gradient Descent

Gradient descent is the primary workhorse for optimizing large-scale problems in machine learning. However, its performance is highly sensitive to the choice of the learning rate. A key limitation of gradient descent is its lack of natural…

Optimization and Control · Mathematics 2025-07-15 Oscar Smee , Fred Roosta , Stephen J. Wright

Scalable Second Order Optimization for Deep Learning

Optimization in machine learning, both theoretical and applied, is presently dominated by first-order gradient methods such as stochastic gradient descent. Second-order optimization methods, that involve second derivatives and/or second…

Machine Learning · Computer Science 2021-03-08 Rohan Anil , Vineet Gupta , Tomer Koren , Kevin Regan , Yoram Singer

Adaptive First- and Second-Order Algorithms for Large-Scale Machine Learning

In this paper, we consider both first- and second-order techniques to address continuous optimization problems arising in machine learning. In the first-order case, we propose a framework of transition from deterministic or…

Machine Learning · Computer Science 2021-11-30 Sanae Lotfi , Tiphaine Bonniot de Ruisselet , Dominique Orban , Andrea Lodi

Adaptive scaling of the learning rate by second order automatic differentiation

In the context of the optimization of Deep Neural Networks, we propose to rescale the learning rate using a new technique of automatic differentiation. This technique relies on the computation of the {\em curvature}, a second order…

Neural and Evolutionary Computing · Computer Science 2022-10-27 Frédéric de Gournay , Alban Gossard

A Robust Adaptive Stochastic Gradient Method for Deep Learning

Stochastic gradient algorithms are the main focus of large-scale optimization problems and led to important successes in the recent advancement of the deep learning algorithms. The convergence of SGD depends on the careful choice of…

Machine Learning · Computer Science 2017-03-03 Caglar Gulcehre , Jose Sotelo , Marcin Moczulski , Yoshua Bengio

Adaptive Optimization Algorithms for Machine Learning

Machine learning assumes a pivotal role in our data-driven world. The increasing scale of models and datasets necessitates quick and reliable algorithms for model training. This dissertation investigates adaptivity in machine learning…

Machine Learning · Computer Science 2023-11-20 Slavomír Hanzely

Improving Adaptive Online Learning Using Refined Discretization

We study unconstrained Online Linear Optimization with Lipschitz losses. Motivated by the pursuit of instance optimality, we propose a new algorithm that simultaneously achieves ($i$) the AdaGrad-style second order gradient adaptivity; and…

Machine Learning · Computer Science 2024-02-23 Zhiyu Zhang , Heng Yang , Ashok Cutkosky , Ioannis Ch. Paschalidis

Second-Order Stochastic Optimization for Machine Learning in Linear Time

First-order stochastic methods are the state-of-the-art in large-scale machine learning optimization owing to efficient per-iteration complexity. Second-order methods, while able to provide faster convergence, have been much less explored…

Machine Learning · Statistics 2017-12-01 Naman Agarwal , Brian Bullins , Elad Hazan

ADASECANT: Robust Adaptive Secant Method for Stochastic Gradient

Stochastic gradient algorithms have been the main focus of large-scale learning problems and they led to important successes in machine learning. The convergence of SGD depends on the careful choice of learning rate and the amount of the…

Machine Learning · Computer Science 2015-11-03 Caglar Gulcehre , Marcin Moczulski , Yoshua Bengio

Adaptive First-and Zeroth-order Methods for Weakly Convex Stochastic Optimization Problems

In this paper, we design and analyze a new family of adaptive subgradient methods for solving an important class of weakly convex (possibly nonsmooth) stochastic optimization problems. Adaptive methods that use exponential moving averages…

Optimization and Control · Mathematics 2020-05-26 Parvin Nazari , Davoud Ataee Tarzanagh , George Michailidis

Oblivious Stochastic Composite Optimization

In stochastic convex optimization problems, most existing adaptive methods rely on prior knowledge about the diameter bound $D$ when the smoothness or the Lipschitz constant is unknown. This often significantly affects performance as only a…

Optimization and Control · Mathematics 2025-10-08 Clément Lezane , Alexandre d'Aspremont

Low-Order Explicit Hessian Imitation Method for Large-Scale Supervised Machine Learning

An algorithm is proposed for solving optimization problems arising in neural network training for supervised learning. The unique feature of the algorithm is the use of an auxiliary loss, in addition to the original loss employed for model…

Optimization and Control · Mathematics 2026-05-11 Yunlang Zhu , Lingjun Guo , Zahra Khatti , Xiaoyi Qu , Chia-Yuan Wu , Lara Zebiane , Frank E. Curtis

Lipschitz Adaptivity with Multiple Learning Rates in Online Learning

We aim to design adaptive online learning algorithms that take advantage of any special structure that might be present in the learning task at hand, with as little manual tuning by the user as possible. A fundamental obstacle that comes up…

Machine Learning · Computer Science 2019-05-31 Zakaria Mhammedi , Wouter M. Koolen , Tim van Erven

Second-order Quantile Methods for Experts and Combinatorial Games

We aim to design strategies for sequential decision making that adjust to the difficulty of the learning problem. We study this question both in the setting of prediction with expert advice, and for more general combinatorial decision…

Machine Learning · Computer Science 2015-03-02 Wouter M. Koolen , Tim van Erven

Adaptive Conditional Gradient Descent

Selecting an effective step-size is a fundamental challenge in first-order optimization, especially for problems with non-Euclidean geometries. This paper presents a novel adaptive step-size strategy for optimization algorithms that rely on…

Optimization and Control · Mathematics 2025-10-14 Abbas Khademi , Antonio Silveti-Falls

Fed-Sophia: A Communication-Efficient Second-Order Federated Learning Algorithm

Federated learning is a machine learning approach where multiple devices collaboratively learn with the help of a parameter server by sharing only their local updates. While gradient-based optimization techniques are widely adopted in this…

Machine Learning · Computer Science 2024-06-12 Ahmed Elbakary , Chaouki Ben Issaid , Mohammad Shehab , Karim Seddik , Tamer ElBatt , Mehdi Bennis

Adaptive and Oblivious Randomized Subspace Methods for High-Dimensional Optimization: Sharp Analysis and Lower Bounds

We propose novel randomized optimization methods for high-dimensional convex problems based on restrictions of variables to random subspaces. We consider oblivious and data-adaptive subspaces and study their approximation properties via…

Information Theory · Computer Science 2020-12-15 Jonathan Lacotte , Mert Pilanci

Scalable Second-Order Optimization Algorithms for Minimizing Low-rank Functions

We present a random-subspace variant of cubic regularization algorithm that chooses the size of the subspace adaptively, based on the rank of the projected second derivative matrix. Iteratively, our variant only requires access to…

Optimization and Control · Mathematics 2025-01-09 Edward Tansley , Coralia Cartis

A Subsampling Line-Search Method with Second-Order Results

In many contemporary optimization problems such as those arising in machine learning, it can be computationally challenging or even infeasible to evaluate an entire function or its derivatives. This motivates the use of stochastic…

Optimization and Control · Mathematics 2021-07-01 El-houcine Bergou , Youssef Diouane , Vladimir Kunc , Vyacheslav Kungurtsev , Clément W. Royer