English
Related papers

Related papers: A Robust Multi-Batch L-BFGS Method for Machine Lea…

200 papers

The question of how to parallelize the stochastic gradient descent (SGD) method has received much attention in the literature. In this paper, we focus instead on batch methods that use a sizeable fraction of the training set at each…

Optimization and Control · Mathematics 2016-10-26 Albert S. Berahas , Jorge Nocedal , Martin Takáč

The standard L-BFGS method relies on gradient approximations that are not dominated by noise, so that search directions are descent directions, the line search is reliable, and quasi-Newton updating yields useful quadratic models of the…

Optimization and Control · Mathematics 2018-05-31 Raghu Bollapragada , Dheevatsa Mudigere , Jorge Nocedal , Hao-Jun Michael Shi , Ping Tak Peter Tang

This paper proposes a framework of L-BFGS based on the (approximate) second-order information with stochastic batches, as a novel approach to the finite-sum minimization problems. Different from the classical L-BFGS where stochastic batches…

Machine Learning · Computer Science 2018-07-17 Jie Liu , Yu Rong , Martin Takac , Junzhou Huang

Quasi-Newton methods still face significant challenges in training large-scale neural networks due to additional compute costs in the Hessian related computations and instability issues in stochastic training. A well-known method, L-BFGS…

Machine Learning · Computer Science 2023-07-27 Yue Niu , Zalan Fabian , Sunwoo Lee , Mahdi Soltanolkotabi , Salman Avestimehr

We propose a new stochastic L-BFGS algorithm and prove a linear convergence rate for strongly convex and smooth functions. Our algorithm draws heavily from a recent stochastic variant of L-BFGS proposed in Byrd et al. (2014) as well as a…

Optimization and Control · Mathematics 2016-04-15 Philipp Moritz , Robert Nishihara , Michael I. Jordan

Motivated by the potential for parallel implementation of batch-based algorithms and the accelerated convergence achievable with approximated second order information a limited memory version of the BFGS algorithm has been receiving…

Machine Learning · Computer Science 2023-03-07 Federico Zocco , Seán McLoone

Bilevel optimization, addressing challenges in hierarchical learning tasks, has gained significant interest in machine learning. The practical implementation of the gradient descent method to bilevel optimization encounters computational…

Machine Learning · Computer Science 2025-02-04 Sheng Fang , Yong-Jin Liu , Wei Yao , Chengming Yu , Jin Zhang

Recent studies have illustrated that stochastic gradient Markov Chain Monte Carlo techniques have a strong potential in non-convex optimization, where local and global convergence guarantees can be shown under certain conditions. By…

Machine Learning · Statistics 2018-06-08 Umut Şimşekli , Çağatay Yıldız , Thanh Huy Nguyen , Gaël Richard , A. Taylan Cemgil

This paper describes an extension of the BFGS and L-BFGS methods for the minimization of a nonlinear function subject to errors. This work is motivated by applications that contain computational noise, employ low-precision arithmetic, or…

Optimization and Control · Mathematics 2021-09-10 Hao-Jun Michael Shi , Yuchen Xie , Richard Byrd , Jorge Nocedal

Although first-order stochastic algorithms, such as stochastic gradient descent, have been the main force to scale up machine learning models, such as deep neural nets, the second-order quasi-Newton methods start to draw attention due to…

Optimization and Control · Mathematics 2020-11-03 Qianqian Tong , Guannan Liang , Xingyu Cai , Chunjiang Zhu , Jinbo Bi

We extend the well-known BFGS quasi-Newton method and its memory-limited variant LBFGS to the optimization of nonsmooth convex objectives. This is done in a rigorous fashion by generalizing three components of BFGS to subdifferentials: the…

Machine Learning · Statistics 2010-11-30 Jin Yu , S. V. N. Vishwanathan , Simon Guenter , Nicol N. Schraudolph

In this paper, a modified BFGS algorithm is proposed. The modified BFGS matrix estimates a modified Hessian matrix which is a convex combination of an identity matrix for the steepest descent algorithm and a Hessian matrix for the Newton…

Optimization and Control · Mathematics 2025-11-14 Yaguang Yang

The classical convergence analysis of quasi-Newton methods assumes that the function and gradients employed at each iteration are exact. In this paper, we consider the case when there are (bounded) errors in both computations and establish…

Optimization and Control · Mathematics 2019-01-29 Yuchen Xie , Richard Byrd , Jorge Nocedal

Deep learning algorithms often require solving a highly non-linear and nonconvex unconstrained optimization problem. Methods for solving optimization problems in large-scale machine learning, such as deep learning and deep reinforcement…

Machine Learning · Computer Science 2019-09-06 Jacob Rafati , Roummel F. Marcia

Reinforcement Learning (RL) algorithms allow artificial agents to improve their action selections so as to increase rewarding experiences in their environments. Deep Reinforcement Learning algorithms require solving a nonconvex and…

Machine Learning · Computer Science 2019-04-18 Jacob Rafati , Roummel F. Marcia

We present two sampled quasi-Newton methods (sampled LBFGS and sampled LSR1) for solving empirical risk minimization problems that arise in machine learning. Contrary to the classical variants of these methods that sequentially build…

Optimization and Control · Mathematics 2021-07-29 Albert S. Berahas , Majid Jahani , Peter Richtárik , Martin Takáč

The question of how to incorporate curvature information in stochastic approximation methods is challenging. The direct application of classical quasi- Newton updating techniques for deterministic optimization leads to noisy curvature…

Optimization and Control · Mathematics 2015-02-19 R. H. Byrd , S. L. Hansen , J. Nocedal , Y. Singer

Quasi-Newton methods are ubiquitous in deterministic local search due to their efficiency and low computational cost. This class of methods uses the history of gradient evaluations to approximate second-order derivatives. However, only…

Optimization and Control · Mathematics 2025-11-24 André Carlon , Luis Espath , Raúl Tempone

We propose a novel limited-memory stochastic block BFGS update for incorporating enriched curvature information in stochastic approximation methods. In our method, the estimate of the inverse Hessian matrix that is maintained by it, is…

Optimization and Control · Mathematics 2016-04-01 Robert M. Gower , Donald Goldfarb , Peter Richtárik

Large-scale unconstrained optimization is a fundamental and important class of, yet not well-solved problems in numerical optimization. The main challenge in designing an algorithm is to require a few storage locations or very inexpensive…

Optimization and Control · Mathematics 2020-01-24 Zheng Li , Shi Shu , Jian-Ping Zhang
‹ Prev 1 2 3 10 Next ›