English
Related papers

Related papers: A Split-Client Approach to Second-Order Optimizati…

200 papers

We propose HAMSI (Hessian Approximated Multiple Subsets Iteration), which is a provably convergent, second order incremental algorithm for solving large-scale partially separable optimization problems. The algorithm is based on a local…

We establish a local $\mathcal{O}(k^{-2})$ rate for the gradient update $x^{k+1}=x^k-\nabla f(x^k)/\sqrt{H\|\nabla f(x^k)\|}$ under a $2H$-Hessian--Lipschitz assumption. Regime detection relies on Hessian--vector products, avoiding Hessian…

Optimization and Control · Mathematics 2025-09-24 Nazarii Tupitsa

Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training. Adam and its variants have been state-of-the-art for years,…

Machine Learning · Computer Science 2024-03-06 Hong Liu , Zhiyuan Li , David Hall , Percy Liang , Tengyu Ma

Fine-tuning large models on edge devices is severely hindered by the memory-intensive backpropagation (BP) in standard frameworks like federated learning and split learning. While substituting BP with zeroth-order optimization can…

Machine Learning · Computer Science 2026-05-28 Qiyuan Chen , Xian Wu , Yi Wang , Xianhao Chen

We analyze Newton's method with lazy Hessian updates for solving general possibly non-convex optimization problems. We propose to reuse a previously seen Hessian for several iterations while computing new gradients at each step of the…

Optimization and Control · Mathematics 2023-06-16 Nikita Doikov , El Mahdi Chayti , Martin Jaggi

This work proposes a universal and adaptive second-order method for minimizing second-order smooth, convex functions. Our algorithm achieves $O(\sigma / \sqrt{T})$ convergence when the oracle feedback is stochastic with variance $\sigma^2$,…

Optimization and Control · Mathematics 2022-12-13 Kimon Antonakopoulos , Ali Kavis , Volkan Cevher

Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are…

Machine Learning · Computer Science 2017-12-21 Huishuai Zhang , Caiming Xiong , James Bradbury , Richard Socher

We propose a novel second-order optimization framework for training the emerging deep continuous-time models, specifically the Neural Ordinary Differential Equations (Neural ODEs). Since their training already involves expensive gradient…

Machine Learning · Computer Science 2021-11-09 Guan-Horng Liu , Tianrong Chen , Evangelos A. Theodorou

In this work we derive a second-order approach to bilevel optimization, a type of mathematical programming in which the solution to a parameterized optimization problem (the "lower" problem) is itself to be optimized (in the "upper"…

Optimization and Control · Mathematics 2022-05-06 Robert Dyro , Edward Schmerling , Nikos Arechiga , Marco Pavone

Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the…

Optimization and Control · Mathematics 2016-02-29 Farbod Roosta-Khorasani , Michael W. Mahoney

Optimization in machine learning, both theoretical and applied, is presently dominated by first-order gradient methods such as stochastic gradient descent. Second-order optimization methods, that involve second derivatives and/or second…

Machine Learning · Computer Science 2021-03-08 Rohan Anil , Vineet Gupta , Tomer Koren , Kevin Regan , Yoram Singer

First-order optimizers are reliable but slow in sharp, anisotropic regions. We study a curvature-adaptive method that periodically sketches a low-rank Hessian subspace via Hessian--vector products and preconditions gradients only in that…

Machine Learning · Computer Science 2025-11-18 Wenzhang Du

In the context of the optimization of Deep Neural Networks, we propose to rescale the learning rate using a new technique of automatic differentiation. This technique relies on the computation of the {\em curvature}, a second order…

Neural and Evolutionary Computing · Computer Science 2022-10-27 Frédéric de Gournay , Alban Gossard

In this paper, we present a new Hyperfast Second-Order Method with convergence rate $O(N^{-5})$ up to a logarithmic factor for the convex function with Lipshitz the third derivative. This method based on two ideas. The first comes from the…

Optimization and Control · Mathematics 2020-06-30 Dmitry Kamzolov , Alexander Gasnikov

Heavy-tailed noise is pervasive in modern machine learning applications, arising from data heterogeneity, outliers, and non-stationary stochastic environments. While second-order methods can significantly accelerate convergence in…

Optimization and Control · Mathematics 2025-10-14 Abdurakhmon Sadiev , Peter Richtárik , Ilyas Fatkhullin

In this work, we develop first-order (Hessian-free) and zero-order (derivative-free) implementations of the Cubically regularized Newton method for solving general non-convex optimization problems. For that, we employ finite difference…

Optimization and Control · Mathematics 2023-09-06 Nikita Doikov , Geovani Nunes Grapiglia

This paper explores second-order optimization methods in Federated Learning (FL), addressing the critical challenges of slow convergence and the excessive communication rounds required to achieve optimal performance from the global model.…

Machine Learning · Computer Science 2025-05-30 Mrinmay Sen , Sidhant R Nair , C Krishna Mohan

Accelerating the convergence of second-order optimization, particularly Newton-type methods, remains a pivotal challenge in algorithmic research. In this paper, we extend previous work on the \textbf{Quadratic Gradient (QG)} and rigorously…

Optimization and Control · Mathematics 2026-04-01 John Chiang

A class of second-order algorithms is proposed for minimizing smooth nonconvex functions that alternates between regularized Newton and negative curvature steps in an iteration-dependent subspace. In most cases, the Hessian matrix is…

Optimization and Control · Mathematics 2023-08-22 Serge Gratton , Sadok Jerad , Philippe L. Toint

We here adapt an extended version of the adaptive cubic regularisation method with dynamic inexact Hessian information for nonconvex optimisation in [3] to the stochastic optimisation setting. While exact function evaluations are still…

Numerical Analysis · Mathematics 2020-09-15 Stefania Bellavia , Gianmarco Gurioli
‹ Prev 1 2 3 10 Next ›