Related papers: Statistical Adaptive Stochastic Gradient Methods

Using Statistics to Automate Stochastic Optimization

Despite the development of numerous adaptive optimizers, tuning the learning rate of stochastic gradient methods remains a major roadblock to obtaining good practical performance in machine learning. Rather than changing the learning rate…

Machine Learning · Statistics 2019-09-27 Hunter Lang , Pengchuan Zhang , Lin Xiao

Adaptive Learning Rates for Faster Stochastic Gradient Methods

In this work, we propose new adaptive step size strategies that improve several stochastic gradient methods. Our first method (StoPS) is based on the classical Polyak step size (Polyak, 1987) and is an extension of the recent development of…

Machine Learning · Computer Science 2022-08-11 Samuel Horváth , Konstantin Mishchenko , Peter Richtárik

Stochastic Gradient Descent: Going As Fast As Possible But Not Faster

When applied to training deep neural networks, stochastic gradient descent (SGD) often incurs steady progression phases, interrupted by catastrophic episodes in which loss and gradient norm explode. A possible mitigation of such events is…

Machine Learning · Statistics 2017-09-06 Alice Schoenauer-Sebag , Marc Schoenauer , Michèle Sebag

No learning rates needed: Introducing SALSA -- Stable Armijo Line Search Adaptation

In recent studies, line search methods have been demonstrated to significantly enhance the performance of conventional stochastic gradient descent techniques across various datasets and architectures, while making an otherwise critical…

Machine Learning · Computer Science 2024-07-31 Philip Kenneweg , Tristan Kenneweg , Fabian Fumagalli , Barbara Hammer

Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems

Stochastic gradient descent is a canonical tool for addressing stochastic optimization problems, and forms the bedrock of modern machine learning and statistics. In this work, we seek to balance the fact that attenuating step-size is…

Signal Processing · Electrical Eng. & Systems 2020-07-10 Zhan Gao , Alec Koppel , Alejandro Ribeiro

Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search

High sensitivity of neural architecture search (NAS) methods against their input such as step-size (i.e., learning rate) and search space prevents practitioners from applying them out-of-the-box to their own problems, albeit its purpose is…

Machine Learning · Computer Science 2019-05-22 Youhei Akimoto , Shinichi Shirakawa , Nozomu Yoshinari , Kento Uchida , Shota Saito , Kouhei Nishida

No More Pesky Learning Rates

The performance of stochastic gradient descent (SGD) depends critically on how learning rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any…

Machine Learning · Statistics 2013-02-19 Tom Schaul , Sixin Zhang , Yann LeCun

A Robust Adaptive Stochastic Gradient Method for Deep Learning

Stochastic gradient algorithms are the main focus of large-scale optimization problems and led to important successes in the recent advancement of the deep learning algorithms. The convergence of SGD depends on the careful choice of…

Machine Learning · Computer Science 2017-03-03 Caglar Gulcehre , Jose Sotelo , Marcin Moczulski , Yoshua Bengio

ADASECANT: Robust Adaptive Secant Method for Stochastic Gradient

Stochastic gradient algorithms have been the main focus of large-scale learning problems and they led to important successes in machine learning. The convergence of SGD depends on the careful choice of learning rate and the amount of the…

Machine Learning · Computer Science 2015-11-03 Caglar Gulcehre , Marcin Moczulski , Yoshua Bengio

Learning-Rate-Free Learning: Dissecting D-Adaptation and Probabilistic Line Search

This paper explores two recent methods for learning rate optimisation in stochastic gradient descent: D-Adaptation (arXiv:2301.07733) and probabilistic line search (arXiv:1502.02846). These approaches aim to alleviate the burden of…

Machine Learning · Computer Science 2023-08-08 Max McGuinness

Robust Learning Rate Selection for Stochastic Optimization via Splitting Diagnostic

This paper proposes SplitSGD, a new dynamic learning rate schedule for stochastic optimization. This method decreases the learning rate for better adaptation to the local geometry of the objective function whenever a stationary phase is…

Machine Learning · Statistics 2024-02-20 Matteo Sordello , Niccolò Dalmasso , Hangfeng He , Weijie Su

A Dynamic Sampling Adaptive-SGD Method for Machine Learning

We propose a stochastic optimization method for minimizing loss functions, expressed as an expected value, that adaptively controls the batch size used in the computation of gradient approximations and the step size used to move along such…

Machine Learning · Computer Science 2020-03-04 Achraf Bahamou , Donald Goldfarb

Gradient-only line searches: An Alternative to Probabilistic Line Searches

Step sizes in neural network training are largely determined using predetermined rules such as fixed learning rates and learning rate schedules. These require user input or expensive global optimization strategies to determine their…

Machine Learning · Statistics 2020-04-07 Dominic Kafka , Daniel Wilke

Optimal Adaptive and Accelerated Stochastic Gradient Descent

Stochastic gradient descent (\textsc{Sgd}) methods are the most powerful optimization tools in training machine learning and deep learning models. Moreover, acceleration (a.k.a. momentum) methods and diagonal scaling (a.k.a. adaptive…

Machine Learning · Statistics 2018-10-02 Qi Deng , Yi Cheng , Guanghui Lan

Speed learning on the fly

The practical performance of online stochastic gradient descent algorithms is highly dependent on the chosen step size, which must be tediously hand-tuned in many applications. The same is true for more advanced variants of stochastic…

Optimization and Control · Mathematics 2015-11-10 Pierre-Yves Massé , Yann Ollivier

SAAGs: Biased Stochastic Variance Reduction Methods for Large-scale Learning

Stochastic approximation is one of the effective approach to deal with the large-scale machine learning problems and the recent research has focused on reduction of variance, caused by the noisy approximations of the gradients. In this…

Machine Learning · Computer Science 2019-04-09 Vinod Kumar Chauhan , Anuj Sharma , Kalpana Dahiya

AdaS: Adaptive Scheduling of Stochastic Gradients

The choice of step-size used in Stochastic Gradient Descent (SGD) optimization is empirically selected in most training procedures. Moreover, the use of scheduled learning techniques such as Step-Decaying, Cyclical-Learning, and Warmup to…

Machine Learning · Computer Science 2020-06-12 Mahdi S. Hosseini , Konstantinos N. Plataniotis

ADASS: Adaptive Sample Selection for Training Acceleration

Stochastic gradient decent~(SGD) and its variants, including some accelerated variants, have become popular for training in machine learning. However, in all existing SGD and its variants, the sample size in each iteration~(epoch) of…

Machine Learning · Statistics 2019-09-18 Shen-Yi Zhao , Hao Gao , Wu-Jun Li

Adaptive Sampling Strategies for Stochastic Optimization

In this paper, we propose a stochastic optimization method that adaptively controls the sample size used in the computation of gradient approximations. Unlike other variance reduction techniques that either require additional storage or the…

Optimization and Control · Mathematics 2017-11-01 Raghu Bollapragada , Richard Byrd , Jorge Nocedal

AutoSGD: Automatic Learning Rate Selection for Stochastic Gradient Descent

The learning rate is an important tuning parameter for stochastic gradient descent (SGD) and can greatly influence its performance. However, appropriate selection of a learning rate schedule across all iterations typically requires a…

Machine Learning · Computer Science 2025-05-29 Nikola Surjanovic , Alexandre Bouchard-Côté , Trevor Campbell