Related papers: Exact Stochastic Second Order Deep Learning

Second-order Neural Network Training Using Complex-step Directional Derivative

While the superior performance of second-order optimization methods such as Newton's method is well known, they are hardly used in practice for deep learning because neither assembling the Hessian matrix nor calculating its inverse is…

Machine Learning · Computer Science 2020-09-16 Siyuan Shen , Tianjia Shao , Kun Zhou , Chenfanfu Jiang , Feng Luo , Yin Yang

Stochastic Second-order Methods for Non-convex Optimization with Inexact Hessian and Gradient

Trust region and cubic regularization methods have demonstrated good performance in small scale non-convex optimization, showing the ability to escape from saddle points. Each iteration of these methods involves computation of gradient,…

Optimization and Control · Mathematics 2018-09-27 Liu Liu , Xuanqing Liu , Cho-Jui Hsieh , Dacheng Tao

Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study

While first-order optimization methods such as stochastic gradient descent (SGD) are popular in machine learning (ML), they come with well-known deficiencies, including relatively-slow convergence, sensitivity to the settings of…

Optimization and Control · Mathematics 2018-02-19 Peng Xu , Farbod Roosta-Khorasani , Michael W. Mahoney

Second-Order Methods with Cubic Regularization Under Inexact Information

In this paper, we generalize (accelerated) Newton's method with cubic regularization under inexact second-order information for (strongly) convex optimization problems. Under mild assumptions, we provide global rate of convergence of these…

Optimization and Control · Mathematics 2017-10-17 Saeed Ghadimi , Han Liu , Tong Zhang

Nys-Newton: Nystr\"om-Approximated Curvature for Stochastic Optimization

Second-order optimization methods are among the most widely used optimization approaches for convex optimization problems, and have recently been used to optimize non-convex optimization problems such as deep learning models. The widely…

Optimization and Control · Mathematics 2022-02-01 Dinesh Singh , Hardik Tankaria , Makoto Yamada

Second-order optimization with lazy Hessians

We analyze Newton's method with lazy Hessian updates for solving general possibly non-convex optimization problems. We propose to reuse a previously seen Hessian for several iterations while computing new gradients at each step of the…

Optimization and Control · Mathematics 2023-06-16 Nikita Doikov , El Mahdi Chayti , Martin Jaggi

Stochastic Analysis of an Adaptive Cubic Regularisation Method under Inexact Gradient Evaluations and Dynamic Hessian Accuracy

We here adapt an extended version of the adaptive cubic regularisation method with dynamic inexact Hessian information for nonconvex optimisation in [3] to the stochastic optimisation setting. While exact function evaluations are still…

Numerical Analysis · Mathematics 2020-09-15 Stefania Bellavia , Gianmarco Gurioli

Series of Hessian-Vector Products for Tractable Saddle-Free Newton Optimisation of Neural Networks

Despite their popularity in the field of continuous optimisation, second-order quasi-Newton methods are challenging to apply in machine learning, as the Hessian matrix is intractably large. This computational burden is exacerbated by the…

Machine Learning · Computer Science 2024-02-28 Elre T. Oldewage , Ross M. Clarke , José Miguel Hernández-Lobato

An Exact Distributed Newton Method for Reinforcement Learning

In this paper, we propose a distributed second- order method for reinforcement learning. Our approach is the fastest in literature so-far as it outperforms state-of-the-art methods, including ADMM, by significant margins. We achieve this by…

Optimization and Control · Mathematics 2016-08-08 Rasul Tutunov , Haitham Bou-Ammar , Ali Jadbabaie

Stochastic Non-convex Optimization with Strong High Probability Second-order Convergence

In this paper, we study stochastic non-convex optimization with non-convex random functions. Recent studies on non-convex optimization revolve around establishing second-order convergence, i.e., converging to a nearly second-order optimal…

Optimization and Control · Mathematics 2017-11-02 Mingrui Liu , Tianbao Yang

Second-Order Stochastic Optimization for Machine Learning in Linear Time

First-order stochastic methods are the state-of-the-art in large-scale machine learning optimization owing to efficient per-iteration complexity. Second-order methods, while able to provide faster convergence, have been much less explored…

Machine Learning · Statistics 2017-12-01 Naman Agarwal , Brian Bullins , Elad Hazan

Faster Differentially Private Convex Optimization via Second-Order Methods

Differentially private (stochastic) gradient descent is the workhorse of DP private machine learning in both the convex and non-convex settings. Without privacy constraints, second-order methods, like Newton's method, converge faster than…

Machine Learning · Computer Science 2023-05-23 Arun Ganesh , Mahdi Haghifam , Thomas Steinke , Abhradeep Thakurta

Non-Convex Self-Concordant Functions: Practical Algorithms and Complexity Analysis

We extend the standard notion of self-concordance to non-convex optimization and develop a family of second-order algorithms with global convergence guarantees. In particular, two function classes -- \textit{weakly self-concordant}…

Optimization and Control · Mathematics 2026-04-07 Donald Goldfarb , Lexiao Lai , Tianyi Lin , Jiayu Zhang

Low-Order Explicit Hessian Imitation Method for Large-Scale Supervised Machine Learning

An algorithm is proposed for solving optimization problems arising in neural network training for supervised learning. The unique feature of the algorithm is the use of an auxiliary loss, in addition to the original loss employed for model…

Optimization and Control · Mathematics 2026-05-11 Yunlang Zhu , Lingjun Guo , Zahra Khatti , Xiaoyi Qu , Chia-Yuan Wu , Lara Zebiane , Frank E. Curtis

Second-Order Guarantees in Centralized, Federated and Decentralized Nonconvex Optimization

Rapid advances in data collection and processing capabilities have allowed for the use of increasingly complex models that give rise to nonconvex optimization problems. These formulations, however, can be arbitrarily difficult to solve in…

Multiagent Systems · Computer Science 2020-04-01 Stefan Vlaski , Ali H. Sayed

A Subsampling Line-Search Method with Second-Order Results

In many contemporary optimization problems such as those arising in machine learning, it can be computationally challenging or even infeasible to evaluate an entire function or its derivatives. This motivates the use of stochastic…

Optimization and Control · Mathematics 2021-07-01 El-houcine Bergou , Youssef Diouane , Vladimir Kunc , Vyacheslav Kungurtsev , Clément W. Royer

Nonlinear discretizations and Newton's method: characterizing stationary points of regression objectives

Second-order methods are emerging as promising alternatives to standard first-order optimizers such as gradient descent and ADAM for training neural networks. Though the advantages of including curvature information in computing…

Machine Learning · Computer Science 2025-10-15 Conor Rowan

Adaptive Regularized Newton Method with Inexact Hessian

Newton's method is the most widespread high-order method, demanding the gradient and the Hessian of the objective function. However, one of the main disadvantages of Newtons method is its lack of global convergence and high iteration cost.…

Optimization and Control · Mathematics 2025-12-10 Aleksandr Shestakov , Nail Bashirov , Andrei Semenov , Alexander Gasnikov , Martin Takáč , Aleksandr Beznosikov , Dmitry Kamzolov

Nestrov's Acceleration For Second Order Method

Optimization plays a key role in machine learning. Recently, stochastic second-order methods have attracted much attention due to their low computational cost in each iteration. However, these algorithms might perform poorly especially if…

Machine Learning · Computer Science 2017-10-25 Haishan Ye , Zhihua Zhang

Block-diagonal Hessian-free Optimization for Training Neural Networks

Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are…

Machine Learning · Computer Science 2017-12-21 Huishuai Zhang , Caiming Xiong , James Bradbury , Richard Socher