English
Related papers

Related papers: An iterative K-FAC algorithm for Deep Learning

200 papers

We propose an efficient method for approximating natural gradient descent in neural networks which we call Kronecker-Factored Approximate Curvature (K-FAC). K-FAC is based on an efficiently invertible approximation of a neural network's…

Machine Learning · Computer Science 2020-06-09 James Martens , Roger Grosse

Second-order optimization methods have the ability to accelerate convergence by modifying the gradient through the curvature matrix. There have been many attempts to use second-order optimization methods for training deep neural networks.…

Machine Learning · Computer Science 2020-11-24 Kai-Xin Gao , Xiao-Lei Liu , Zheng-Hai Huang , Min Wang , Zidong Wang , Dachuan Xu , Fan Yu

Training neural networks with many processors can reduce time-to-solution; however, it is challenging to maintain convergence and efficiency at large scales. The Kronecker-factored Approximate Curvature (K-FAC) was recently proposed as an…

Machine Learning · Computer Science 2020-07-03 J. Gregory Pauloski , Zhao Zhang , Lei Huang , Weijia Xu , Ian T. Foster

Several studies have shown the ability of natural gradient descent to minimize the objective function more efficiently than ordinary gradient descent based methods. However, the bottleneck of this approach for training deep neural networks…

Neural and Evolutionary Computing · Computer Science 2022-10-17 Abdoulaye Koroko , Ani Anciaux-Sedrakian , Ibtihel Ben Gharbia , Valérie Garès , Mounir Haddou , Quang Huy Tran

Second-order optimization methods for training neural networks, such as KFAC, exhibit superior convergence by utilizing curvature information of loss landscape. However, it comes at the expense of high computational burden. In this work, we…

Machine Learning · Computer Science 2025-11-12 Hyunseok Seung , Jaewoo Lee , Hyunsuk Ko

This paper advances the computational efficiency of Deep Hedging frameworks through the novel integration of Kronecker-Factored Approximate Curvature (K-FAC) optimization. While recent literature has established Deep Hedging as a…

Statistical Finance · Quantitative Finance 2024-11-25 Tsogt-Ochir Enkhbayar

Second-order optimization methods such as natural gradient descent have the potential to speed up training of neural networks by correcting for the curvature of the loss function. Unfortunately, the exact natural gradient is impractical to…

Machine Learning · Statistics 2016-05-25 Roger Grosse , James Martens

Kronecker-factored approximate curvature (KFAC) is arguably one of the most prominent curvature approximations in deep learning. Its applications range from optimization to Bayesian deep learning, training data attribution with influence…

Machine Learning · Computer Science 2025-07-08 Felix Dangel , Bálint Mucsányi , Tobias Weber , Runa Eschenhagen

The core components of many modern neural network architectures, such as transformers, convolutional, or graph neural networks, can be expressed as linear layers with $\textit{weight-sharing}$. Kronecker-Factored Approximate Curvature…

Machine Learning · Computer Science 2024-01-12 Runa Eschenhagen , Alexander Immer , Richard E. Turner , Frank Schneider , Philipp Hennig

In the context of deep learning, many optimization methods use gradient covariance information in order to accelerate the convergence of Stochastic Gradient Descent. In particular, starting with Adagrad, a seemingly endless line of research…

Machine Learning · Computer Science 2020-12-08 Nikolaos Tselepidis , Jonas Kohler , Antonio Orvieto

Using second-order optimization methods for training deep neural networks (DNNs) has attracted many researchers. A recently proposed method, Eigenvalue-corrected Kronecker Factorization (EKFAC) (George et al., 2018), proposes an…

Machine Learning · Computer Science 2020-11-30 Kai-Xin Gao , Xiao-Lei Liu , Zheng-Hai Huang , Min Wang , Shuangling Wang , Zidong Wang , Dachuan Xu , Fan Yu

Second-order optimizers are thought to hold the potential to speed up neural network training, but due to the enormous size of the curvature matrix, they typically require approximations to be computationally tractable. The most successful…

Machine Learning · Computer Science 2022-06-13 Frederik Benzing

Most neural networks are trained using first-order optimization methods, which are sensitive to the parameterization of the model. Natural gradient descent is invariant to smooth reparameterizations because it is defined in a…

Machine Learning · Computer Science 2018-08-31 Kevin Luk , Roger Grosse

Distributed training with synchronous stochastic gradient descent (SGD) on GPU clusters has been widely used to accelerate the training process of deep models. However, SGD only utilizes the first-order gradient in model parameter updates,…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-15 Shaohuai Shi , Lin Zhang , Bo Li

K-FAC is a successful tractable implementation of Natural Gradient for Deep Learning, which nevertheless suffers from the requirement to compute the inverse of the Kronecker factors (through an eigen-decomposition). This can be very…

Machine Learning · Computer Science 2022-11-28 Constantin Octavian Puiu

K-FAC (arXiv:1503.05671, arXiv:1602.01407) is a tractable implementation of Natural Gradient (NG) for Deep Learning (DL), whose bottleneck is computing the inverses of the so-called ``Kronecker-Factors'' (K-factors). RS-KFAC…

Machine Learning · Computer Science 2023-09-13 Constantin Octavian Puiu

Optimization algorithms that leverage gradient covariance information, such as variants of natural gradient descent (Amari, 1998), offer the prospect of yielding more effective descent directions. For models with many parameters, the…

Machine Learning · Computer Science 2021-07-27 Thomas George , César Laurent , Xavier Bouthillier , Nicolas Ballas , Pascal Vincent

Many hardware proposals have aimed to accelerate inference in AI workloads. Less attention has been paid to hardware acceleration of training, despite the enormous societal impact of rapid training of AI models. Physics-based computers,…

The second-order optimization methods, notably the D-KFAC (Distributed Kronecker Factored Approximate Curvature) algorithms, have gained traction on accelerating deep neural network (DNN) training on GPU clusters. However, existing D-KFAC…

Machine Learning · Computer Science 2022-07-01 Lin Zhang , Shaohuai Shi , Wei Wang , Bo Li

In stochastic optimization, using large batch sizes during training can leverage parallel resources to produce faster wall-clock training times per training epoch. However, for both training loss and testing error, recent results analyzing…

Machine Learning · Computer Science 2021-04-21 Linjian Ma , Gabe Montague , Jiayu Ye , Zhewei Yao , Amir Gholami , Kurt Keutzer , Michael W. Mahoney
‹ Prev 1 2 3 10 Next ›