Related papers: Natural Hypergradient Descent: Algorithm Design, C…
Natural gradient descent (NGD) is a powerful optimization technique for machine learning, but the computational complexity of the inverse Fisher information matrix limits its application in training deep neural networks. To overcome this…
Natural Gradient Descent, a second-degree optimization method motivated by the information geometry, makes use of the Fisher Information Matrix instead of the Hessian which is typically used. However, in many cases, the Fisher Information…
Natural gradient descent (NGD) provided deep insights and powerful tools to deep neural networks. However the computation of Fisher information matrix becomes more and more difficult as the network structure turns large and complex. This…
We propose efficient numerical schemes for implementing the natural gradient descent (NGD) for a broad range of metric spaces with applications to PDE-based optimization problems. Our technique represents the natural gradient direction as a…
Natural gradient descent is an optimization method traditionally motivated from the perspective of information geometry, and works well for many applications as an alternative to stochastic gradient descent. In this paper we critically…
Second-order training methods have better convergence properties than gradient descent but are rarely used in practice for large-scale training due to their computational overhead. This can be viewed as a hardware limitation (imposed by…
We consider the problem of approximating a function by an element of a nonlinear manifold which admits a differentiable parametrization, typical examples being neural networks with differentiable activation functions or tensor networks.…
Bilevel optimization has been widely used in many machine learning applications such as hyperparameter optimization and meta learning. Recently, many simple stochastic gradient descent(SGD) type algorithms(without using momentum and…
Optimization problem, which is aimed at finding the global minimal value of a given cost function, is one of the central problem in science and engineering. Various numerical methods have been proposed to solve this problem, among which the…
Bilevel optimization, addressing challenges in hierarchical learning tasks, has gained significant interest in machine learning. The practical implementation of the gradient descent method to bilevel optimization encounters computational…
This work proposes a time-efficient Natural Gradient Descent method, called TENGraD, with linear convergence guarantees. Computing the inverse of the neural network's Fisher information matrix is expensive in NGD because the Fisher matrix…
Second-order optimizers hold intriguing potential for deep learning, but suffer from increased cost and sensitivity to the non-convexity of the loss surface as compared to gradient-based approaches. We introduce a coordinate descent method…
The essential difficulty of gradient-based bilevel optimization using implicit differentiation is to estimate the inverse Hessian vector product with respect to neural network parameters. This paper proposes to tackle this problem by the…
Bilevel optimization has arisen as a powerful tool in modern machine learning. However, due to the nested structure of bilevel optimization, even gradient-based methods require second-order derivative approximations via Jacobian- or/and…
Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the…
We study online inference and asymptotic covariance estimation for the stochastic gradient descent (SGD) algorithm. While classical methods (such as plug-in and batch-means estimators) are available, they either require inaccessible…
Two-level stochastic optimization formulations have become instrumental in a number of machine learning contexts such as continual learning, neural architecture search, adversarial learning, and hyperparameter tuning. Practical stochastic…
Training deep neural network is a high dimensional and a highly non-convex optimization problem. Stochastic gradient descent (SGD) algorithm and it's variations are the current state-of-the-art solvers for this task. However, due to…
Natural Gradient Descent (NGD) helps to accelerate the convergence of gradient descent dynamics, but it requires approximations in large-scale deep neural networks because of its high computational cost. Empirical studies have confirmed…
Many real-world data are sequentially collected over time and often exhibit skewed class distributions, resulting in imbalanced data streams. While existing approaches have explored several strategies, such as resampling and reweighting,…