Related papers: Hessian approximations

Approximating the diagonal of a Hessian: which sample set of points should be used

An explicit formula to approximate the diagonal entries of the Hessian is introduced. When the derivative-free technique called \emph{generalized centered simplex gradient} is used to approximate the gradient, then the formula can be…

Numerical Analysis · Mathematics 2021-04-27 Gabriel Jarry-Bolduc

A matrix algebra approach to approximate Hessians

This work presents a novel matrix-based method for constructing an approximation Hessian using only function evaluations. The method requires less computational power than interpolation-based methods and is easy to implement in matrix-based…

Numerical Analysis · Mathematics 2023-04-07 W. Hare , G. Jarry-Bolduc , C. Planiden

Advancing the lower bounds: An accelerated, stochastic, second-order method with optimal adaptation to inexactness

We present a new accelerated stochastic second-order method that is robust to both gradient and Hessian inexactness, which occurs typically in machine learning. We establish theoretical lower bounds and prove that our algorithm achieves…

Optimization and Control · Mathematics 2024-05-28 Artem Agafonov , Dmitry Kamzolov , Alexander Gasnikov , Ali Kavis , Kimon Antonakopoulos , Volkan Cevher , Martin Takáč

Using generalized simplex methods to approximate derivatives

This paper presents two methods for approximating a proper subset of the entries of a Hessian using only function evaluations. These approximations are obtained using the techniques called \emph{generalized simplex Hessian} and…

Numerical Analysis · Mathematics 2025-05-14 Gabriel Jarry-Bolduc , Chayne Planiden

Gradient and Hessian approximations in Derivative Free Optimization

This work investigates finite differences and the use of interpolation models to obtain approximations to the first and second derivatives of a function. Here, it is shown that if a particular set of points is used in the interpolation…

Optimization and Control · Mathematics 2020-01-24 Ian D. Coope , Rachael Tappenden

Sub-Sampled Newton Methods I: Globally Convergent Algorithms

Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the…

Optimization and Control · Mathematics 2016-02-29 Farbod Roosta-Khorasani , Michael W. Mahoney

Block-diagonal Hessian-free Optimization for Training Neural Networks

Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are…

Machine Learning · Computer Science 2017-12-21 Huishuai Zhang , Caiming Xiong , James Bradbury , Richard Socher

Nys-Newton: Nystr\"om-Approximated Curvature for Stochastic Optimization

Second-order optimization methods are among the most widely used optimization approaches for convex optimization problems, and have recently been used to optimize non-convex optimization problems such as deep learning models. The widely…

Optimization and Control · Mathematics 2022-02-01 Dinesh Singh , Hardik Tankaria , Makoto Yamada

First and zeroth-order implementations of the regularized Newton method with lazy approximated Hessians

In this work, we develop first-order (Hessian-free) and zero-order (derivative-free) implementations of the Cubically regularized Newton method for solving general non-convex optimization problems. For that, we employ finite difference…

Optimization and Control · Mathematics 2023-09-06 Nikita Doikov , Geovani Nunes Grapiglia

HesScale: Scalable Computation of Hessian Diagonals

Second-order optimization uses curvature information about the objective function, which can help in faster convergence. However, such methods typically require expensive computation of the Hessian matrix, preventing their usage in a…

Machine Learning · Computer Science 2022-11-03 Mohamed Elsayed , A. Rupam Mahmood

Stochastic Analysis of an Adaptive Cubic Regularisation Method under Inexact Gradient Evaluations and Dynamic Hessian Accuracy

We here adapt an extended version of the adaptive cubic regularisation method with dynamic inexact Hessian information for nonconvex optimisation in [3] to the stochastic optimisation setting. While exact function evaluations are still…

Numerical Analysis · Mathematics 2020-09-15 Stefania Bellavia , Gianmarco Gurioli

The bilinear Hessian for large scale optimization

Second order information is useful in many ways in smooth optimization problems, including for the design of step size rules and descent directions, or the analysis of the local properties of the objective functional. However, the…

Optimization and Control · Mathematics 2025-02-06 Marcus Carlsson , Viktor Nikitin , Erik Troedsson , Herwig Wendt

Learning-Augmented Sketches for Hessians

Sketching is a dimensionality reduction technique where one compresses a matrix by linear combinations that are chosen at random. A line of work has shown how to sketch the Hessian to speed up each iteration in a second order method, but…

Machine Learning · Computer Science 2021-10-07 Yi Li , Honghao Lin , David P. Woodruff

Low-Order Explicit Hessian Imitation Method for Large-Scale Supervised Machine Learning

An algorithm is proposed for solving optimization problems arising in neural network training for supervised learning. The unique feature of the algorithm is the use of an auxiliary loss, in addition to the original loss employed for model…

Optimization and Control · Mathematics 2026-05-11 Yunlang Zhu , Lingjun Guo , Zahra Khatti , Xiaoyi Qu , Chia-Yuan Wu , Lara Zebiane , Frank E. Curtis

Distributed Averaging Methods for Randomized Second Order Optimization

We consider distributed optimization problems where forming the Hessian is computationally challenging and communication is a significant bottleneck. We develop unbiased parameter averaging methods for randomized second order optimization…

Machine Learning · Statistics 2020-02-18 Burak Bartan , Mert Pilanci

Iterative algorithm with structured diagonal Hessian approximation for solving nonlinear least squares problems

Nonlinear least-squares problems are a special class of unconstrained optimization problems in which their gradient and Hessian have special structures. In this paper, we exploit these structures and proposed a matrix-free algorithm with a…

Optimization and Control · Mathematics 2020-02-06 Aliyu Muhammed Awwal , Poom Kumam , Hassan Mohammad

Adaptive Regularized Newton Method with Inexact Hessian

Newton's method is the most widespread high-order method, demanding the gradient and the Hessian of the objective function. However, one of the main disadvantages of Newtons method is its lack of global convergence and high iteration cost.…

Optimization and Control · Mathematics 2025-12-10 Aleksandr Shestakov , Nail Bashirov , Andrei Semenov , Alexander Gasnikov , Martin Takáč , Aleksandr Beznosikov , Dmitry Kamzolov

On the Convergence Theory for Hessian-Free Bilevel Algorithms

Bilevel optimization has arisen as a powerful tool in modern machine learning. However, due to the nested structure of bilevel optimization, even gradient-based methods require second-order derivative approximations via Jacobian- or/and…

Machine Learning · Computer Science 2022-06-07 Daouda Sow , Kaiyi Ji , Yingbin Liang

Nested Bayesian Optimization for Computer Experiments

Computer experiments can emulate the physical systems, help computational investigations, and yield analytic solutions. They have been widely employed with many engineering applications (e.g., aerospace, automotive, energy systems.…

Methodology · Statistics 2022-08-23 Yan Wang , Meng Wang , Areej AlBahar , Xiaowei Yue

Stochastic Hessian Fittings with Lie Groups

This report investigates the fitting of the Hessian or its inverse for stochastic optimizations using a Hessian fitting criterion derived from the preconditioned stochastic gradient descent (PSGD) method. This criterion is closely related…

Machine Learning · Statistics 2025-12-02 Xi-Lin Li