Related papers: Natural Hypergradient Descent: Algorithm Design, C…

Reconstructing Deep Neural Networks: Unleashing the Optimization Potential of Natural Gradient Descent

Natural gradient descent (NGD) is a powerful optimization technique for machine learning, but the computational complexity of the inverse Fisher information matrix limits its application in training deep neural networks. To overcome this…

Machine Learning · Computer Science 2024-12-11 Weihua Liu , Said Boumaraf , Jianwu Li , Chaochao Lin , Xiabi Liu , Lijuan Niu , Naoufel Werghi

Natural Gradient Methods: Perspectives, Efficient-Scalable Approximations, and Analysis

Natural Gradient Descent, a second-degree optimization method motivated by the information geometry, makes use of the Fisher Information Matrix instead of the Hessian which is typically used. However, in many cases, the Fisher Information…

Machine Learning · Computer Science 2023-03-10 Rajesh Shrestha

A Novel Structured Natural Gradient Descent for Deep Learning

Natural gradient descent (NGD) provided deep insights and powerful tools to deep neural networks. However the computation of Fisher information matrix becomes more and more difficult as the network structure turns large and complex. This…

Machine Learning · Computer Science 2021-09-22 Weihua Liu , Xiabi Liu

Efficient Natural Gradient Descent Methods for Large-Scale PDE-Based Optimization Problems

We propose efficient numerical schemes for implementing the natural gradient descent (NGD) for a broad range of metric spaces with applications to PDE-based optimization problems. Our technique represents the natural gradient direction as a…

Optimization and Control · Mathematics 2023-01-12 Levon Nurbekyan , Wanzhou Lei , Yunan Yang

New insights and perspectives on the natural gradient method

Natural gradient descent is an optimization method traditionally motivated from the perspective of information geometry, and works well for many applications as an alternative to stochastic gradient descent. In this paper we critically…

Machine Learning · Computer Science 2020-09-22 James Martens

Thermodynamic Natural Gradient Descent

Second-order training methods have better convergence properties than gradient descent but are rarely used in practice for large-scale training due to their computational overhead. This can be viewed as a hardware limitation (imposed by…

Machine Learning · Computer Science 2024-05-24 Kaelan Donatella , Samuel Duffield , Maxwell Aifer , Denis Melanson , Gavin Crooks , Patrick J. Coles

Natural gradient descent with momentum

We consider the problem of approximating a function by an element of a nonlinear manifold which admits a differentiable parametrization, typical examples being neural networks with differentiable activation functions or tensor networks.…

Machine Learning · Computer Science 2026-04-20 Anthony Nouy , Agustín Somacal

A New Simple Stochastic Gradient Descent Type Algorithm With Lower Computational Complexity for Bilevel Optimization

Bilevel optimization has been widely used in many machine learning applications such as hyperparameter optimization and meta learning. Recently, many simple stochastic gradient descent(SGD) type algorithms(without using momentum and…

Optimization and Control · Mathematics 2023-06-21 Haimei Huo , Risheng Liu , Zhixun Su

Generalization to the Natural Gradient Descent

Optimization problem, which is aimed at finding the global minimal value of a given cost function, is one of the central problem in science and engineering. Various numerical methods have been proposed to solve this problem, among which the…

Optimization and Control · Mathematics 2022-10-07 Shaojun Dong , Fengyu Le , Meng Zhang , Si-Jing Tao , Chao Wang , Yong-Jian Han , Guo-Ping Guo

qNBO: quasi-Newton Meets Bilevel Optimization

Bilevel optimization, addressing challenges in hierarchical learning tasks, has gained significant interest in machine learning. The practical implementation of the gradient descent method to bilevel optimization encounters computational…

Machine Learning · Computer Science 2025-02-04 Sheng Fang , Yong-Jin Liu , Wei Yao , Chengming Yu , Jin Zhang

TENGraD: Time-Efficient Natural Gradient Descent with Exact Fisher-Block Inversion

This work proposes a time-efficient Natural Gradient Descent method, called TENGraD, with linear convergence guarantees. Computing the inverse of the neural network's Fisher information matrix is expensive in NGD because the Fisher matrix…

Machine Learning · Computer Science 2022-03-04 Saeed Soori , Bugra Can , Baourun Mu , Mert Gürbüzbalaban , Maryam Mehri Dehnavi

A block coordinate descent optimizer for classification problems exploiting convexity

Second-order optimizers hold intriguing potential for deep learning, but suffer from increased cost and sensitivity to the non-convexity of the loss surface as compared to gradient-based approaches. We introduce a coordinate descent method…

Machine Learning · Computer Science 2020-06-19 Ravi G. Patel , Nathaniel A. Trask , Mamikon A. Gulian , Eric C. Cyr

Nystrom Method for Accurate and Scalable Implicit Differentiation

The essential difficulty of gradient-based bilevel optimization using implicit differentiation is to estimate the inverse Hessian vector product with respect to neural network parameters. This paper proposes to tackle this problem by the…

Machine Learning · Computer Science 2023-02-21 Ryuichiro Hataya , Makoto Yamada

On the Convergence Theory for Hessian-Free Bilevel Algorithms

Bilevel optimization has arisen as a powerful tool in modern machine learning. However, due to the nested structure of bilevel optimization, even gradient-based methods require second-order derivative approximations via Jacobian- or/and…

Machine Learning · Computer Science 2022-06-07 Daouda Sow , Kaiyi Ji , Yingbin Liang

Sub-Sampled Newton Methods I: Globally Convergent Algorithms

Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the…

Optimization and Control · Mathematics 2016-02-29 Farbod Roosta-Khorasani , Michael W. Mahoney

Refining Covariance Matrix Estimation in Stochastic Gradient Descent Through Bias Reduction

We study online inference and asymptotic covariance estimation for the stochastic gradient descent (SGD) algorithm. While classical methods (such as plug-in and batch-means estimators) are available, they either require inaccessible…

Machine Learning · Statistics 2026-04-24 Ziyang Wei , Wanrong Zhu , Jingyang Lyu , Wei Biao Wu

Inexact bilevel stochastic gradient methods for constrained and unconstrained lower-level problems

Two-level stochastic optimization formulations have become instrumental in a number of machine learning contexts such as continual learning, neural architecture search, adversarial learning, and hyperparameter tuning. Practical stochastic…

Optimization and Control · Mathematics 2023-11-08 Tommaso Giovannelli , Griffin Dean Kent , Luis Nunes Vicente

Distributed Hessian-Free Optimization for Deep Neural Network

Training deep neural network is a high dimensional and a highly non-convex optimization problem. Stochastic gradient descent (SGD) algorithm and it's variations are the current state-of-the-art solvers for this task. However, due to…

Machine Learning · Computer Science 2017-01-17 Xi He , Dheevatsa Mudigere , Mikhail Smelyanskiy , Martin Takáč

Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks

Natural Gradient Descent (NGD) helps to accelerate the convergence of gradient descent dynamics, but it requires approximations in large-scale deep neural networks because of its high computational cost. Empirical studies have confirmed…

Machine Learning · Statistics 2022-01-12 Ryo Karakida , Kazuki Osawa

Harmonized Gradient Descent for Class Imbalanced Data Stream Online Learning

Many real-world data are sequentially collected over time and often exhibit skewed class distributions, resulting in imbalanced data streams. While existing approaches have explored several strategies, such as resampling and reweighting,…

Machine Learning · Computer Science 2025-08-18 Han Zhou , Hongpeng Yin , Xuanhong Deng , Yuyu Huang , Hao Ren