English
Related papers

Related papers: Eva: A General Vectorized Approximation Framework …

200 papers

Second-order optimizers are thought to hold the potential to speed up neural network training, but due to the enormous size of the curvature matrix, they typically require approximations to be computationally tractable. The most successful…

Machine Learning · Computer Science 2022-06-13 Frederik Benzing

Large-scale distributed training of deep neural networks suffer from the generalization gap caused by the increase in the effective mini-batch size. Previous approaches try to solve this problem by varying the learning rate and batch size…

Machine Learning · Computer Science 2019-04-02 Kazuki Osawa , Yohei Tsuji , Yuichiro Ueno , Akira Naruse , Rio Yokota , Satoshi Matsuoka

Approximate second-order optimization methods often exhibit poorer generalization compared to first-order approaches. In this work, we look into this issue through the lens of the loss landscape and find that existing second-order methods…

Machine Learning · Computer Science 2025-06-25 Dahun Shin , Dongyeop Lee , Jinseok Chung , Namhoon Lee

Second-order optimization methods offer notable advantages in training deep neural networks by utilizing curvature information to achieve faster convergence. However, traditional second-order techniques are computationally prohibitive,…

Machine Learning · Computer Science 2024-10-04 James Vo

Second-order optimization methods, which leverage curvature information, offer faster and more stable convergence than first-order methods such as stochastic gradient descent (SGD) and Adam. However, their practical adoption is hindered by…

Emerging Technologies · Computer Science 2025-12-08 Saitao Zhang , Yubiao Luo , Shiqing Wang , Pushen Zuo , Yongxiang Li , Lunshuai Pan , Zheng Miao , Zhong Sun

Second order stochastic optimizers allow parameter update step size and direction to adapt to loss curvature, but have traditionally required too much memory and compute for deep learning. Recently, Shampoo [Gupta et al., 2018] introduced a…

Machine Learning · Statistics 2023-06-01 Jonathan Mei , Alexander Moreno , Luke Walters

A stochastic iterative algorithm approximating second-order information using von Neumann series is discussed. We present convergence guarantees for strongly-convex and smooth functions. Our analysis is much simpler in contrast to a similar…

Optimization and Control · Mathematics 2017-04-14 Mojmir Mutny

Kronecker-factored Approximate Curvature (K-FAC) has recently been shown to converge faster in deep neural network (DNN) training than stochastic gradient descent (SGD); however, K-FAC's larger memory footprint hinders its applicability to…

Machine Learning · Computer Science 2021-09-21 J. Gregory Pauloski , Qi Huang , Lei Huang , Shivaram Venkataraman , Kyle Chard , Ian Foster , Zhao Zhang

Second-order optimization has been developed to accelerate the training of deep neural networks and it is being applied to increasingly larger-scale models. In this study, towards training on further larger scales, we identify a specific…

Machine Learning · Computer Science 2024-06-11 Satoki Ishikawa , Ryo Karakida

It has long been a goal to efficiently compute and use second order information on a function ($f$) to assist in numerical approximations. Here it is shown how, using only basic physics and a numerical approximation, such information can be…

Machine Learning · Computer Science 2021-05-31 Michael F. Zimmer

Recently, Stochastic Variational Inference (SVI) has been increasingly attractive thanks to its ability to find good posterior approximations of probabilistic models. It optimizes the variational objective with stochastic optimization,…

Machine Learning · Computer Science 2022-03-16 Minta Liu , Suliang Bu

Stochastic gradient descent (SGD) now acts as a fundamental part of optimization in current machine learning. Meanwhile, deep learning architectures have shown outstanding performance in a wide range of fields, such as natural language…

Machine Learning · Computer Science 2026-01-27 Zhao Song , Song Yue

Training deep neural networks (DNNs) used in modern machine learning is computationally expensive. Machine learning scientists, therefore, rely on stochastic first-order methods for training, coupled with significant hand-tuning, to obtain…

Machine Learning · Computer Science 2023-07-24 Eric Silk , Swarnita Chakraborty , Nairanjana Dasgupta , Anand D. Sarwate , Andrew Lumsdaine , Tony Chiang

First-order stochastic methods are the state-of-the-art in large-scale machine learning optimization owing to efficient per-iteration complexity. Second-order methods, while able to provide faster convergence, have been much less explored…

Machine Learning · Statistics 2017-12-01 Naman Agarwal , Brian Bullins , Elad Hazan

This paper proposes a new family of algorithms for training neural networks (NNs). These are based on recent developments in the field of non-convex optimization, going under the general name of successive convex approximation (SCA)…

Machine Learning · Statistics 2017-06-16 Simone Scardapane , Paolo Di Lorenzo

Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training. Adam and its variants have been state-of-the-art for years,…

Machine Learning · Computer Science 2024-03-06 Hong Liu , Zhiyuan Li , David Hall , Percy Liang , Tengyu Ma

We propose a novel second-order optimization framework for training the emerging deep continuous-time models, specifically the Neural Ordinary Differential Equations (Neural ODEs). Since their training already involves expensive gradient…

Machine Learning · Computer Science 2021-11-09 Guan-Horng Liu , Tianrong Chen , Evangelos A. Theodorou

This paper advances the computational efficiency of Deep Hedging frameworks through the novel integration of Kronecker-Factored Approximate Curvature (K-FAC) optimization. While recent literature has established Deep Hedging as a…

Statistical Finance · Quantitative Finance 2024-11-25 Tsogt-Ochir Enkhbayar

An algorithm is proposed for solving optimization problems arising in neural network training for supervised learning. The unique feature of the algorithm is the use of an auxiliary loss, in addition to the original loss employed for model…

Optimization and Control · Mathematics 2026-05-11 Yunlang Zhu , Lingjun Guo , Zahra Khatti , Xiaoyi Qu , Chia-Yuan Wu , Lara Zebiane , Frank E. Curtis

Balancing convergence speed, generalization capability, and computational efficiency remains a core challenge in deep learning optimization. First-order gradient descent methods, epitomized by stochastic gradient descent (SGD) and Adam,…

‹ Prev 1 2 3 10 Next ›