Related papers: A Multi-Batch L-BFGS Method for Machine Learning

A Robust Multi-Batch L-BFGS Method for Machine Learning

This paper describes an implementation of the L-BFGS method designed to deal with two adversarial situations. The first occurs in distributed computing environments where some of the computational nodes devoted to the evaluation of the…

Optimization and Control · Mathematics 2019-08-28 Albert S. Berahas , Martin Takáč

A Progressive Batching L-BFGS Method for Machine Learning

The standard L-BFGS method relies on gradient approximations that are not dominated by noise, so that search directions are descent directions, the line search is reliable, and quasi-Newton updating yields useful quadratic models of the…

Optimization and Control · Mathematics 2018-05-31 Raghu Bollapragada , Dheevatsa Mudigere , Jorge Nocedal , Hao-Jun Michael Shi , Ping Tak Peter Tang

Asynchronous Parallel Stochastic Quasi-Newton Methods

Although first-order stochastic algorithms, such as stochastic gradient descent, have been the main force to scale up machine learning models, such as deep neural nets, the second-order quasi-Newton methods start to draw attention due to…

Optimization and Control · Mathematics 2020-11-03 Qianqian Tong , Guannan Liang , Xingyu Cai , Chunjiang Zhu , Jinbo Bi

A Parallel SGD method with Strong Convergence

This paper proposes a novel parallel stochastic gradient descent (SGD) method that is obtained by applying parallel sets of SGD iterations (each set operating on one node using the data residing in it) for finding the direction in each…

Machine Learning · Computer Science 2013-11-05 Dhruv Mahajan , S. Sathiya Keerthi , S. Sundararajan , Leon Bottou

On the Acceleration of L-BFGS with Second-Order Information and Stochastic Batches

This paper proposes a framework of L-BFGS based on the (approximate) second-order information with stochastic batches, as a novel approach to the finite-sum minimization problems. Different from the classical L-BFGS where stochastic batches…

Machine Learning · Computer Science 2018-07-17 Jie Liu , Yu Rong , Martin Takac , Junzhou Huang

Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting

We propose mS2GD: a method incorporating a mini-batching scheme for improving the theoretical complexity and practical performance of semi-stochastic gradient descent (S2GD). We consider the problem of minimizing a strongly convex function…

Machine Learning · Computer Science 2016-04-20 Jakub Konečný , Jie Liu , Peter Richtárik , Martin Takáč

Improving the convergence of SGD through adaptive batch sizes

Mini-batch stochastic gradient descent (SGD) and variants thereof approximate the objective function's gradient with a small number of training examples, aka the batch size. Small batch sizes require little computation for each model update…

Machine Learning · Computer Science 2023-09-28 Scott Sievert , Shrey Shah

Asynchronous Stochastic Quasi-Newton MCMC for Non-Convex Optimization

Recent studies have illustrated that stochastic gradient Markov Chain Monte Carlo techniques have a strong potential in non-convex optimization, where local and global convergence guarantees can be shown under certain conditions. By…

Machine Learning · Statistics 2018-06-08 Umut Şimşekli , Çağatay Yıldız , Thanh Huy Nguyen , Gaël Richard , A. Taylan Cemgil

mS2GD: Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting

We propose a mini-batching scheme for improving the theoretical complexity and practical performance of semi-stochastic gradient descent applied to the problem of minimizing a strongly convex composite function represented as the sum of an…

Machine Learning · Computer Science 2014-10-20 Jakub Konečný , Jie Liu , Peter Richtárik , Martin Takáč

mL-BFGS: A Momentum-based L-BFGS for Distributed Large-Scale Neural Network Optimization

Quasi-Newton methods still face significant challenges in training large-scale neural networks due to additional compute costs in the Hessian related computations and instability issues in stochastic training. A well-known method, L-BFGS…

Machine Learning · Computer Science 2023-07-27 Yue Niu , Zalan Fabian , Sunwoo Lee , Mahdi Soltanolkotabi , Salman Avestimehr

Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training

Stochastic gradient descent~(SGD) and its variants have been the dominating optimization methods in machine learning. Compared to SGD with small-batch training, SGD with large-batch training can better utilize the computational power of…

Machine Learning · Statistics 2024-04-16 Shen-Yi Zhao , Chang-Wei Shi , Yin-Peng Xie , Wu-Jun Li

Stochastic Block BFGS: Squeezing More Curvature out of Data

We propose a novel limited-memory stochastic block BFGS update for incorporating enriched curvature information in stochastic approximation methods. In our method, the estimate of the inverse Hessian matrix that is maintained by it, is…

Optimization and Control · Mathematics 2016-04-01 Robert M. Gower , Donald Goldfarb , Peter Richtárik

Quasi-Newton Optimization Methods For Deep Learning Applications

Deep learning algorithms often require solving a highly non-linear and nonconvex unconstrained optimization problem. Methods for solving optimization problems in large-scale machine learning, such as deep learning and deep reinforcement…

Machine Learning · Computer Science 2019-09-06 Jacob Rafati , Roummel F. Marcia

Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients

Recent work has established an empirically successful framework for adapting learning rates for stochastic gradient descent (SGD). This effectively removes all needs for tuning, while automatically reducing learning rates over time on…

Machine Learning · Computer Science 2013-03-28 Tom Schaul , Yann LeCun

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

The stochastic gradient descent (SGD) method and its variants are algorithms of choice for many Deep Learning tasks. These methods operate in a small-batch regime wherein a fraction of the training data, say $32$-$512$ data points, is…

Machine Learning · Computer Science 2017-02-13 Nitish Shirish Keskar , Dheevatsa Mudigere , Jorge Nocedal , Mikhail Smelyanskiy , Ping Tak Peter Tang

Block stochastic gradient iteration for convex and nonconvex optimization

The stochastic gradient (SG) method can minimize an objective function composed of a large number of differentiable functions, or solve a stochastic optimization problem, to a moderate accuracy. The block coordinate descent/update (BCD)…

Optimization and Control · Mathematics 2015-11-23 Yangyang Xu , Wotao Yin

A Stochastic Quasi-Newton Method for Large-Scale Optimization

The question of how to incorporate curvature information in stochastic approximation methods is challenging. The direct application of classical quasi- Newton updating techniques for deterministic optimization leads to noisy curvature…

Optimization and Control · Mathematics 2015-02-19 R. H. Byrd , S. L. Hansen , J. Nocedal , Y. Singer

Efficient Stochastic BFGS methods Inspired by Bayesian Principles

Quasi-Newton methods are ubiquitous in deterministic local search due to their efficiency and low computational cost. This class of methods uses the history of gradient evaluations to approximate second-order derivatives. However, only…

Optimization and Control · Mathematics 2025-11-24 André Carlon , Luis Espath , Raúl Tempone

Deep Reinforcement Learning via L-BFGS Optimization

Reinforcement Learning (RL) algorithms allow artificial agents to improve their action selections so as to increase rewarding experiences in their environments. Deep Reinforcement Learning algorithms require solving a nonconvex and…

Machine Learning · Computer Science 2019-04-18 Jacob Rafati , Roummel F. Marcia

Gradient and Variable Tracking with Multiple Local SGD for Decentralized Non-Convex Learning

Stochastic distributed optimization methods that solve an optimization problem over a multi-agent network have played an important role in a variety of large-scale signal processing and machine leaning applications. Among the existing…

Optimization and Control · Mathematics 2023-02-06 Songyang Ge , Tsung-Hui Chang