Related papers: Learning with Random Learning Rates

Learning Rate Dropout

The performance of a deep neural network is highly dependent on its training, and finding better local optimal solutions is the goal of many optimization algorithms. However, existing optimization algorithms show a preference for descent…

Computer Vision and Pattern Recognition · Computer Science 2019-12-06 Huangxing Lin , Weihong Zeng , Xinghao Ding , Yue Huang , Chenxi Huang , John Paisley

Differentiable Self-Adaptive Learning Rate

Learning rate adaptation is a popular topic in machine learning. Gradient Descent trains neural nerwork with a fixed learning rate. Learning rate adaptation is proposed to accelerate the training process through adjusting the step size in…

Machine Learning · Computer Science 2022-10-20 Bozhou Chen , Hongzhi Wang , Chenmin Ba

No More Pesky Learning Rates

The performance of stochastic gradient descent (SGD) depends critically on how learning rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any…

Machine Learning · Statistics 2013-02-19 Tom Schaul , Sixin Zhang , Yann LeCun

Interpreting Adaptive Gradient Methods by Parameter Scaling for Learning-Rate-Free Optimization

We address the challenge of estimating the learning rate for adaptive gradient methods used in training deep neural networks. While several learning-rate-free approaches have been proposed, they are typically tailored for steepest descent.…

Machine Learning · Computer Science 2024-01-09 Min-Kook Suh , Seung-Woo Seo

Exploring the Optimized Value of Each Hyperparameter in Various Gradient Descent Algorithms

In the recent years, various gradient descent algorithms including the methods of gradient descent, gradient descent with momentum, adaptive gradient (AdaGrad), root-mean-square propagation (RMSProp) and adaptive moment estimation (Adam)…

Machine Learning · Computer Science 2024-09-19 Abel C. H. Chen

A Robust Adaptive Stochastic Gradient Method for Deep Learning

Stochastic gradient algorithms are the main focus of large-scale optimization problems and led to important successes in the recent advancement of the deep learning algorithms. The convergence of SGD depends on the careful choice of…

Machine Learning · Computer Science 2017-03-03 Caglar Gulcehre , Jose Sotelo , Marcin Moczulski , Yoshua Bengio

Learning Rate Annealing Improves Tuning Robustness in Stochastic Optimization

The learning rate in stochastic gradient methods is a critical hyperparameter that is notoriously costly to tune via standard grid search, especially for training modern large-scale models with billions of parameters. We identify a…

Machine Learning · Computer Science 2026-02-17 Amit Attia , Tomer Koren

Deep Reinforcement Learning using Cyclical Learning Rates

Deep Reinforcement Learning (DRL) methods often rely on the meticulous tuning of hyperparameters to successfully resolve problems. One of the most influential parameters in optimization procedures based on stochastic gradient descent (SGD)…

Machine Learning · Computer Science 2020-08-05 Ralf Gulde , Marc Tuscher , Akos Csiszar , Oliver Riedel , Alexander Verl

Reinforcement Learning for Learning Rate Control

Stochastic gradient descent (SGD), which updates the model parameters by adding a local gradient times a learning rate at each step, is widely used in model training of machine learning algorithms such as neural networks. It is observed…

Machine Learning · Computer Science 2017-06-01 Chang Xu , Tao Qin , Gang Wang , Tie-Yan Liu

Online Learning Rate Adaptation with Hypergradient Descent

We introduce a general method for improving the convergence rate of gradient-based optimizers that is easy to implement and works well in practice. We demonstrate the effectiveness of the method in a range of optimization problems by…

Machine Learning · Computer Science 2018-08-23 Atilim Gunes Baydin , Robert Cornish , David Martinez Rubio , Mark Schmidt , Frank Wood

AutoGD: Automatic Learning Rate Selection for Gradient Descent

The performance of gradient-based optimization methods, such as standard gradient descent (GD), greatly depends on the choice of learning rate. However, it can require a non-trivial amount of user tuning effort to select an appropriate…

Machine Learning · Computer Science 2025-10-14 Nikola Surjanovic , Alexandre Bouchard-Côté , Trevor Campbell

LipschitzLR: Using theoretically computed adaptive learning rates for fast convergence

Optimizing deep neural networks is largely thought to be an empirical process, requiring manual tuning of several hyper-parameters, such as learning rate, weight decay, and dropout rate. Arguably, the learning rate is the most important of…

Machine Learning · Computer Science 2020-08-04 Rahul Yedida , Snehanshu Saha , Tejas Prashanth

Learning Rate Adaptation for Federated and Differentially Private Learning

We propose an algorithm for the adaptation of the learning rate for stochastic gradient descent (SGD) that avoids the need for validation set use. The idea for the adaptiveness comes from the technique of extrapolation: to get an estimate…

Machine Learning · Statistics 2020-08-28 Antti Koskela , Antti Honkela

Masked Training of Neural Networks with Partial Gradients

State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD). Recently, many variations have been explored: perturbing parameters for better accuracy (such as in Extragradient), limiting SGD…

Machine Learning · Computer Science 2022-03-23 Amirkeivan Mohtashami , Martin Jaggi , Sebastian U. Stich

A Hessian-informed hyperparameter optimization for differential learning rate

Differential learning rate (DLR), a technique that applies different learning rates to different model parameters, has been widely used in deep learning and achieved empirical success via its various forms. For example, parameter-efficient…

Machine Learning · Computer Science 2025-05-20 Shiyun Xu , Zhiqi Bu , Yiliang Zhang , Ian Barnett

Automatic and Simultaneous Adjustment of Learning Rate and Momentum for Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) methods are prominent for training machine learning and deep learning models. The performance of these techniques depends on their hyperparameter tuning over time and varies for different models and…

Machine Learning · Statistics 2019-08-22 Tomer Lancewicki , Selcuk Kopru

Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients

Recent work has established an empirically successful framework for adapting learning rates for stochastic gradient descent (SGD). This effectively removes all needs for tuning, while automatically reducing learning rates over time on…

Machine Learning · Computer Science 2013-03-28 Tom Schaul , Yann LeCun

Gradient descent revisited via an adaptive online learning rate

Any gradient descent optimization requires to choose a learning rate. With deeper and deeper models, tuning that learning rate can easily become tedious and does not necessarily lead to an ideal convergence. We propose a variation of the…

Machine Learning · Statistics 2018-04-10 Mathieu Ravaut , Satya Gorti

Staleness-aware Async-SGD for Distributed Deep Learning

Deep neural networks have been shown to achieve state-of-the-art performance in several machine learning tasks. Stochastic Gradient Descent (SGD) is the preferred optimization algorithm for training these networks and asynchronous SGD…

Machine Learning · Computer Science 2016-04-06 Wei Zhang , Suyog Gupta , Xiangru Lian , Ji Liu

AdaSmooth: An Adaptive Learning Rate Method based on Effective Ratio

It is well known that we need to choose the hyper-parameters in Momentum, AdaGrad, AdaDelta, and other alternative stochastic optimizers. While in many cases, the hyper-parameters are tuned tediously based on experience becoming more of an…

Machine Learning · Computer Science 2022-04-05 Jun Lu