Related papers: Stochastic Gradient Estimation for Higher-order Di…

DOME: Improving Signal-to-Noise in Stochastic Gradient Descent via Sharp-Direction Subspace Filtering

Stochastic gradients for deep neural networks exhibit strong correlations along the optimization trajectory, and are often aligned with a small set of Hessian eigenvectors associated with outlier eigenvalues. Recent work shows that…

Machine Learning · Computer Science 2026-02-04 Julien Nicolas , Mohamed Maouche , Sonia Ben Mokhtar , Mark Coates

Better SGD using Second-order Momentum

We develop a new algorithm for non-convex stochastic optimization that finds an $\epsilon$-critical point in the optimal $O(\epsilon^{-3})$ stochastic gradient and Hessian-vector product computations. Our algorithm uses Hessian-vector…

Machine Learning · Computer Science 2021-07-13 Hoang Tran , Ashok Cutkosky

Refining Covariance Matrix Estimation in Stochastic Gradient Descent Through Bias Reduction

We study online inference and asymptotic covariance estimation for the stochastic gradient descent (SGD) algorithm. While classical methods (such as plug-in and batch-means estimators) are available, they either require inaccessible…

Machine Learning · Statistics 2026-04-24 Ziyang Wei , Wanrong Zhu , Jingyang Lyu , Wei Biao Wu

Sub-Sampled Newton Methods I: Globally Convergent Algorithms

Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the…

Optimization and Control · Mathematics 2016-02-29 Farbod Roosta-Khorasani , Michael W. Mahoney

Accelerating SGD for Distributed Deep-Learning Using Approximated Hessian Matrix

We introduce a novel method to compute a rank $m$ approximation of the inverse of the Hessian matrix in the distributed regime. By leveraging the differences in gradients and parameters of multiple Workers, we are able to efficiently…

Machine Learning · Computer Science 2017-09-18 Sébastien M. R. Arnold , Chunming Wang

Inexact and Stochastic Gradient Optimization Algorithms with Inertia and Hessian Driven Damping

In a real Hilbert space setting, we study the convergence properties of an inexact gradient algorithm featuring both viscous and Hessian driven damping for convex differentiable optimization. In this algorithm, the gradient evaluation can…

Optimization and Control · Mathematics 2025-09-25 Harsh Choudhary , Jalal Fadili , Vyachelav Kungurtsev

Random directions stochastic approximation with deterministic perturbations

We introduce deterministic perturbation schemes for the recently proposed random directions stochastic approximation (RDSA) [17], and propose new first-order and second-order algorithms. In the latter case, these are the first second-order…

Optimization and Control · Mathematics 2019-03-29 Prashanth L A , Shalabh Bhatnagar , Nirav Bhavsar , Michael Fu , Steven I. Marcus

Estimating the gradient and higher-order derivatives on quantum hardware

For a large class of variational quantum circuits, we show how arbitrary-order derivatives can be analytically evaluated in terms of simple parameter-shift rules, i.e., by running the same circuit with different shifts of the parameters. As…

Quantum Physics · Physics 2021-03-03 Andrea Mari , Thomas R. Bromley , Nathan Killoran

Gradient-based Hyperparameter Optimization through Reversible Learning

Tuning hyperparameters of learning algorithms is hard because gradients are usually unavailable. We compute exact gradients of cross-validation performance with respect to all hyperparameters by chaining derivatives backwards through the…

Machine Learning · Statistics 2015-04-03 Dougal Maclaurin , David Duvenaud , Ryan P. Adams

Numerical Computation of the Gradient and the Action of the Hessian for Time-Dependent PDE-Constrained Optimization Problems

We present a systematic derivation of the algorithms required for computing the gradient and the action of the Hessian of an arbitrary misfit function for large-scale parameter estimation problems involving linear time-dependent PDEs with…

Optimization and Control · Mathematics 2016-08-09 Kai Rothauge , Eldad Haber , Uri Ascher

Adaptive Sketches for Robust Regression with Importance Sampling

We introduce data structures for solving robust regression through stochastic gradient descent (SGD) by sampling gradients with probability proportional to their norm, i.e., importance sampling. Although SGD is widely used for large scale…

Machine Learning · Computer Science 2022-07-19 Sepideh Mahabadi , David P. Woodruff , Samson Zhou

Neural Nets with a Newton Conjugate Gradient Method on Multiple GPUs

Training deep neural networks consumes increasing computational resource shares in many compute centers. Often, a brute force approach to obtain hyperparameter values is employed. Our goal is (1) to enhance this by enabling second-order…

Machine Learning · Computer Science 2022-08-04 Severin Reiz , Tobias Neckel , Hans-Joachim Bungartz

Quadratic Gradient: A Unified Framework Bridging Gradient Descent and Newton-Type Methods by Synthesizing Hessians and Gradients

Accelerating the convergence of second-order optimization, particularly Newton-type methods, remains a pivotal challenge in algorithmic research. In this paper, we extend previous work on the \textbf{Quadratic Gradient (QG)} and rigorously…

Optimization and Control · Mathematics 2026-04-01 John Chiang

Higher-Order Convolution Improves Neural Predictivity in the Retina

We present a novel approach to neural response prediction that incorporates higher-order operations directly within convolutional neural networks (CNNs). Our model extends traditional 3D CNNs by embedding higher-order operations within the…

Computer Vision and Pattern Recognition · Computer Science 2025-05-13 Simone Azeglio , Victor Calbiague Garcia , Guilhem Glaziou , Peter Neri , Olivier Marre , Ulisse Ferrari

High-order covariant differentiation in applications to Helmholtz-Hodge decomposition on curved surfaces

A novel high-order numerical scheme is proposed to compute the covariant derivative, particularly for divergence and curl, on any curved surface. The proposed scheme does not require the construction of a curved axis or metric tensor, which…

Numerical Analysis · Mathematics 2020-04-30 Sehun Chun

A block coordinate descent optimizer for classification problems exploiting convexity

Second-order optimizers hold intriguing potential for deep learning, but suffer from increased cost and sensitivity to the non-convexity of the loss surface as compared to gradient-based approaches. We introduce a coordinate descent method…

Machine Learning · Computer Science 2020-06-19 Ravi G. Patel , Nathaniel A. Trask , Mamikon A. Gulian , Eric C. Cyr

Gathering and Exploiting Higher-Order Information when Training Large Structured Models

When training large models, such as neural networks, the full derivatives of order 2 and beyond are usually inaccessible, due to their computational cost. Therefore, among the second-order optimization methods, it is common to bypass the…

Machine Learning · Computer Science 2025-10-01 Pierre Wolinski

Importance Sampling for Stochastic Gradient Descent in Deep Neural Networks

Stochastic gradient descent samples uniformly the training set to build an unbiased gradient estimate with a limited number of samples. However, at a given step of the training process, some data are more helpful than others to continue…

Machine Learning · Computer Science 2023-03-30 Thibault Lahire

Gradient Estimation and Variance Reduction in Stochastic and Deterministic Models

It seems that in the current age, computers, computation, and data have an increasingly important role to play in scientific research and discovery. This is reflected in part by the rise of machine learning and artificial intelligence,…

Machine Learning · Computer Science 2024-05-15 Ronan Keane

Efficient Learning of Generative Models via Finite-Difference Score Matching

Several machine learning applications involve the optimization of higher-order derivatives (e.g., gradients of gradients) during training, which can be expensive in respect to memory and computation even with automatic differentiation. As a…

Machine Learning · Computer Science 2020-11-26 Tianyu Pang , Kun Xu , Chongxuan Li , Yang Song , Stefano Ermon , Jun Zhu