Related papers: A Split-Client Approach to Second-Order Optimizati…

HAMSI: A Parallel Incremental Optimization Algorithm Using Quadratic Approximations for Solving Partially Separable Problems

We propose HAMSI (Hessian Approximated Multiple Subsets Iteration), which is a provably convergent, second order incremental algorithm for solving large-scale partially separable optimization problems. The algorithm is based on a local…

Machine Learning · Statistics 2017-08-07 Kamer Kaya , Figen Öztoprak , Ş. İlker Birbil , A. Taylan Cemgil , Umut Şimşekli , Nurdan Kuru , Hazal Koptagel , M. Kaan Öztürk

CaCuTe: Casual Cubic-Model Technique for Faster Optimization

We establish a local $\mathcal{O}(k^{-2})$ rate for the gradient update $x^{k+1}=x^k-\nabla f(x^k)/\sqrt{H\|\nabla f(x^k)\|}$ under a $2H$-Hessian--Lipschitz assumption. Regime detection relies on Hessian--vector products, avoiding Hessian…

Optimization and Control · Mathematics 2025-09-24 Nazarii Tupitsa

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training

Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training. Adam and its variants have been state-of-the-art for years,…

Machine Learning · Computer Science 2024-03-06 Hong Liu , Zhiyuan Li , David Hall , Percy Liang , Tengyu Ma

HO-SFL: Hybrid-Order Split Federated Learning with Backprop-Free Clients and Dimension-Free Aggregation

Fine-tuning large models on edge devices is severely hindered by the memory-intensive backpropagation (BP) in standard frameworks like federated learning and split learning. While substituting BP with zeroth-order optimization can…

Machine Learning · Computer Science 2026-05-28 Qiyuan Chen , Xian Wu , Yi Wang , Xianhao Chen

Second-order optimization with lazy Hessians

We analyze Newton's method with lazy Hessian updates for solving general possibly non-convex optimization problems. We propose to reuse a previously seen Hessian for several iterations while computing new gradients at each step of the…

Optimization and Control · Mathematics 2023-06-16 Nikita Doikov , El Mahdi Chayti , Martin Jaggi

Extra-Newton: A First Approach to Noise-Adaptive Accelerated Second-Order Methods

This work proposes a universal and adaptive second-order method for minimizing second-order smooth, convex functions. Our algorithm achieves $O(\sigma / \sqrt{T})$ convergence when the oracle feedback is stochastic with variance $\sigma^2$,…

Optimization and Control · Mathematics 2022-12-13 Kimon Antonakopoulos , Ali Kavis , Volkan Cevher

Block-diagonal Hessian-free Optimization for Training Neural Networks

Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are…

Machine Learning · Computer Science 2017-12-21 Huishuai Zhang , Caiming Xiong , James Bradbury , Richard Socher

Second-Order Neural ODE Optimizer

We propose a novel second-order optimization framework for training the emerging deep continuous-time models, specifically the Neural Ordinary Differential Equations (Neural ODEs). Since their training already involves expensive gradient…

Machine Learning · Computer Science 2021-11-09 Guan-Horng Liu , Tianrong Chen , Evangelos A. Theodorou

Second-Order Sensitivity Analysis for Bilevel Optimization

In this work we derive a second-order approach to bilevel optimization, a type of mathematical programming in which the solution to a parameterized optimization problem (the "lower" problem) is itself to be optimized (in the "upper"…

Optimization and Control · Mathematics 2022-05-06 Robert Dyro , Edward Schmerling , Nikos Arechiga , Marco Pavone

Sub-Sampled Newton Methods I: Globally Convergent Algorithms

Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the…

Optimization and Control · Mathematics 2016-02-29 Farbod Roosta-Khorasani , Michael W. Mahoney

Scalable Second Order Optimization for Deep Learning

Optimization in machine learning, both theoretical and applied, is presently dominated by first-order gradient methods such as stochastic gradient descent. Second-order optimization methods, that involve second derivatives and/or second…

Machine Learning · Computer Science 2021-03-08 Rohan Anil , Vineet Gupta , Tomer Koren , Kevin Regan , Yoram Singer

CAO: Curvature-Adaptive Optimization via Periodic Low-Rank Hessian Sketching

First-order optimizers are reliable but slow in sharp, anisotropic regions. We study a curvature-adaptive method that periodically sketches a low-rank Hessian subspace via Hessian--vector products and preconditions gradients only in that…

Machine Learning · Computer Science 2025-11-18 Wenzhang Du

Adaptive scaling of the learning rate by second order automatic differentiation

In the context of the optimization of Deep Neural Networks, we propose to rescale the learning rate using a new technique of automatic differentiation. This technique relies on the computation of the {\em curvature}, a second order…

Neural and Evolutionary Computing · Computer Science 2022-10-27 Frédéric de Gournay , Alban Gossard

Near-Optimal Hyperfast Second-Order Method for convex optimization and its Sliding

In this paper, we present a new Hyperfast Second-Order Method with convergence rate $O(N^{-5})$ up to a logarithmic factor for the convex function with Lipshitz the third derivative. This method based on two ideas. The first comes from the…

Optimization and Control · Mathematics 2020-06-30 Dmitry Kamzolov , Alexander Gasnikov

Second-order Optimization under Heavy-Tailed Noise: Hessian Clipping and Sample Complexity Limits

Heavy-tailed noise is pervasive in modern machine learning applications, arising from data heterogeneity, outliers, and non-stationary stochastic environments. While second-order methods can significantly accelerate convergence in…

Optimization and Control · Mathematics 2025-10-14 Abdurakhmon Sadiev , Peter Richtárik , Ilyas Fatkhullin

First and zeroth-order implementations of the regularized Newton method with lazy approximated Hessians

In this work, we develop first-order (Hessian-free) and zero-order (derivative-free) implementations of the Cubically regularized Newton method for solving general non-convex optimization problems. For that, we employ finite difference…

Optimization and Control · Mathematics 2023-09-06 Nikita Doikov , Geovani Nunes Grapiglia

Accelerated Training of Federated Learning via Second-Order Methods

This paper explores second-order optimization methods in Federated Learning (FL), addressing the critical challenges of slow convergence and the excessive communication rounds required to achieve optimal performance from the global model.…

Machine Learning · Computer Science 2025-05-30 Mrinmay Sen , Sidhant R Nair , C Krishna Mohan

Quadratic Gradient: A Unified Framework Bridging Gradient Descent and Newton-Type Methods by Synthesizing Hessians and Gradients

Accelerating the convergence of second-order optimization, particularly Newton-type methods, remains a pivotal challenge in algorithmic research. In this paper, we extend previous work on the \textbf{Quadratic Gradient (QG)} and rigorously…

Optimization and Control · Mathematics 2026-04-01 John Chiang

Yet another fast variant of Newton's method for nonconvex optimization

A class of second-order algorithms is proposed for minimizing smooth nonconvex functions that alternates between regularized Newton and negative curvature steps in an iteration-dependent subspace. In most cases, the Hessian matrix is…

Optimization and Control · Mathematics 2023-08-22 Serge Gratton , Sadok Jerad , Philippe L. Toint

Stochastic Analysis of an Adaptive Cubic Regularisation Method under Inexact Gradient Evaluations and Dynamic Hessian Accuracy

We here adapt an extended version of the adaptive cubic regularisation method with dynamic inexact Hessian information for nonconvex optimisation in [3] to the stochastic optimisation setting. While exact function evaluations are still…

Numerical Analysis · Mathematics 2020-09-15 Stefania Bellavia , Gianmarco Gurioli