Related papers: Accelerated Natural Gradient Method for Parametric…
Optimizers that further adjust the scale of gradient, such as Adam, Natural Gradient (NG), etc., despite widely concerned and used by the community, are often found poor generalization performance, compared with Stochastic Gradient Descent…
We consider the problem of approximating a function by an element of a nonlinear manifold which admits a differentiable parametrization, typical examples being neural networks with differentiable activation functions or tensor networks.…
Optimization problem, which is aimed at finding the global minimal value of a given cost function, is one of the central problem in science and engineering. Various numerical methods have been proposed to solve this problem, among which the…
We propose efficient numerical schemes for implementing the natural gradient descent (NGD) for a broad range of metric spaces with applications to PDE-based optimization problems. Our technique represents the natural gradient direction as a…
We present a family of algorithms, called descent algorithms, for optimizing convex and non-convex functions. We also introduce a new first-order algorithm, called rescaled gradient descent (RGD), and show that RGD achieves a faster…
In this work we propose a differential geometric motivation for Nesterov's accelerated gradient method (AGM) for strongly-convex problems. By considering the optimization procedure as occurring on a Riemannian manifold with a natural…
Gradient regularization (GR) has been shown to improve the generalizability of trained models. While Natural Gradient Descent has been shown to accelerate optimization in the initial phase of training, little attention has been paid to how…
We consider the maximum mean discrepancy ($\mathrm{MMD}$) GAN problem and propose a parametric kernelized gradient flow that mimics the min-max game in gradient regularized $\mathrm{MMD}$ GAN. We show that this flow provides a descent…
We propose a Riemannian version of Nesterov's Accelerated Gradient algorithm (RAGD), and show that for geodesically smooth and strongly convex problems, within a neighborhood of the minimizer whose radius depends on the condition number as…
We provide a novel accelerated first-order method that achieves the asymptotically optimal convergence rate for smooth functions in the first-order oracle model. To this day, Nesterov's Accelerated Gradient Descent (AGD) and variations…
Stochastic optimization plays a crucial role in the advancement of deep learning technologies. Over the decades, significant effort has been dedicated to improving the training efficiency and robustness of deep neural networks, via various…
Orthogonal Gradient Descent (OGD) has emerged as a powerful method for continual learning. However, its Euclidean projections do not leverage the underlying information-geometric structure of the problem, which can lead to suboptimal…
Natural Gradient Descent (NGD) has emerged as a promising optimization algorithm for training neural network-based solvers for partial differential equations (PDEs), such as Physics-Informed Neural Networks (PINNs). However, its practical…
We propose an optimization method for minimizing the finite sums of smooth convex functions. Our method incorporates an accelerated gradient descent (AGD) and a stochastic variance reduction gradient (SVRG) in a mini-batch setting. Unlike…
While Nesterov's Accelerated Gradient Descent (AGD) efficiently solves constrained problems when the constraint set $X \subseteq \mathbb{R}^n$ is simple and easy to project onto, it remains an open question whether function-constrained…
In this work, we propose Natural Hypergradient Descent (NHGD), a new method for solving bilevel optimization problems. To address the computational bottleneck in hypergradient estimation--namely, the need to compute or approximate Hessian…
Gradient Descent (GD) is a ubiquitous algorithm for finding the optimal solution to an optimization problem. For reduced computational complexity, the optimal solution $\mathrm{x^*}$ of the optimization problem must be attained in a minimum…
We study a type of Riemannian gradient descent (RGD) algorithm, designed through Riemannian preconditioning, for optimization on $\mathcal{M}_k^{m\times n}$ -- the set of $m\times n$ real matrices with a fixed rank $k$. Our analysis is…
Nesterov's accelerated gradient descent method (AGD) is a seminal deterministic first-order method known to achieve the optimal order of iteration complexity for solving convex smooth optimization problems. Two distinct sequences of…
The performance of gradient-based optimization methods, such as standard gradient descent (GD), greatly depends on the choice of learning rate. However, it can require a non-trivial amount of user tuning effort to select an appropriate…