Related papers: Hessian approximations
An explicit formula to approximate the diagonal entries of the Hessian is introduced. When the derivative-free technique called \emph{generalized centered simplex gradient} is used to approximate the gradient, then the formula can be…
This work presents a novel matrix-based method for constructing an approximation Hessian using only function evaluations. The method requires less computational power than interpolation-based methods and is easy to implement in matrix-based…
We present a new accelerated stochastic second-order method that is robust to both gradient and Hessian inexactness, which occurs typically in machine learning. We establish theoretical lower bounds and prove that our algorithm achieves…
This paper presents two methods for approximating a proper subset of the entries of a Hessian using only function evaluations. These approximations are obtained using the techniques called \emph{generalized simplex Hessian} and…
This work investigates finite differences and the use of interpolation models to obtain approximations to the first and second derivatives of a function. Here, it is shown that if a particular set of points is used in the interpolation…
Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the…
Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are…
Second-order optimization methods are among the most widely used optimization approaches for convex optimization problems, and have recently been used to optimize non-convex optimization problems such as deep learning models. The widely…
In this work, we develop first-order (Hessian-free) and zero-order (derivative-free) implementations of the Cubically regularized Newton method for solving general non-convex optimization problems. For that, we employ finite difference…
Second-order optimization uses curvature information about the objective function, which can help in faster convergence. However, such methods typically require expensive computation of the Hessian matrix, preventing their usage in a…
We here adapt an extended version of the adaptive cubic regularisation method with dynamic inexact Hessian information for nonconvex optimisation in [3] to the stochastic optimisation setting. While exact function evaluations are still…
Second order information is useful in many ways in smooth optimization problems, including for the design of step size rules and descent directions, or the analysis of the local properties of the objective functional. However, the…
Sketching is a dimensionality reduction technique where one compresses a matrix by linear combinations that are chosen at random. A line of work has shown how to sketch the Hessian to speed up each iteration in a second order method, but…
An algorithm is proposed for solving optimization problems arising in neural network training for supervised learning. The unique feature of the algorithm is the use of an auxiliary loss, in addition to the original loss employed for model…
We consider distributed optimization problems where forming the Hessian is computationally challenging and communication is a significant bottleneck. We develop unbiased parameter averaging methods for randomized second order optimization…
Nonlinear least-squares problems are a special class of unconstrained optimization problems in which their gradient and Hessian have special structures. In this paper, we exploit these structures and proposed a matrix-free algorithm with a…
Newton's method is the most widespread high-order method, demanding the gradient and the Hessian of the objective function. However, one of the main disadvantages of Newtons method is its lack of global convergence and high iteration cost.…
Bilevel optimization has arisen as a powerful tool in modern machine learning. However, due to the nested structure of bilevel optimization, even gradient-based methods require second-order derivative approximations via Jacobian- or/and…
Computer experiments can emulate the physical systems, help computational investigations, and yield analytic solutions. They have been widely employed with many engineering applications (e.g., aerospace, automotive, energy systems.…
This report investigates the fitting of the Hessian or its inverse for stochastic optimizations using a Hessian fitting criterion derived from the preconditioned stochastic gradient descent (PSGD) method. This criterion is closely related…