English
Related papers

Related papers: Angle based dynamic learning rate for gradient des…

200 papers

We address the challenge of estimating the learning rate for adaptive gradient methods used in training deep neural networks. While several learning-rate-free approaches have been proposed, they are typically tailored for steepest descent.…

Machine Learning · Computer Science 2024-01-09 Min-Kook Suh , Seung-Woo Seo

We introduce a general method for improving the convergence rate of gradient-based optimizers that is easy to implement and works well in practice. We demonstrate the effectiveness of the method in a range of optimization problems by…

Machine Learning · Computer Science 2018-08-23 Atilim Gunes Baydin , Robert Cornish , David Martinez Rubio , Mark Schmidt , Frank Wood

Any gradient descent optimization requires to choose a learning rate. With deeper and deeper models, tuning that learning rate can easily become tedious and does not necessarily lead to an ideal convergence. We propose a variation of the…

Machine Learning · Statistics 2018-04-10 Mathieu Ravaut , Satya Gorti

Optimizing deep neural networks is largely thought to be an empirical process, requiring manual tuning of several hyper-parameters, such as learning rate, weight decay, and dropout rate. Arguably, the learning rate is the most important of…

Machine Learning · Computer Science 2020-08-04 Rahul Yedida , Snehanshu Saha , Tejas Prashanth

Stochastic gradient algorithms have been the main focus of large-scale learning problems and they led to important successes in machine learning. The convergence of SGD depends on the careful choice of learning rate and the amount of the…

Machine Learning · Computer Science 2015-11-03 Caglar Gulcehre , Marcin Moczulski , Yoshua Bengio

The performance of gradient-based optimization methods, such as standard gradient descent (GD), greatly depends on the choice of learning rate. However, it can require a non-trivial amount of user tuning effort to select an appropriate…

Machine Learning · Computer Science 2025-10-14 Nikola Surjanovic , Alexandre Bouchard-Côté , Trevor Campbell

Learning rate adaptation is a popular topic in machine learning. Gradient Descent trains neural nerwork with a fixed learning rate. Learning rate adaptation is proposed to accelerate the training process through adjusting the step size in…

Machine Learning · Computer Science 2022-10-20 Bozhou Chen , Hongzhi Wang , Chenmin Ba

Stochastic gradient algorithms are the main focus of large-scale optimization problems and led to important successes in the recent advancement of the deep learning algorithms. The convergence of SGD depends on the careful choice of…

Machine Learning · Computer Science 2017-03-03 Caglar Gulcehre , Jose Sotelo , Marcin Moczulski , Yoshua Bengio

In this paper, we describe a new way to get convergence rates for optimal methods in smooth (strongly) convex optimization tasks. Our approach is based on results for tasks where gradients have nonrandom small noises. Unlike previous…

Optimization and Control · Mathematics 2020-07-14 Darina Dvinskikh , Alexander Tyurin , Alexander Gasnikov , Sergey Omelchenko

In this paper, a new gradient-based optimization approach by automatically adjusting the learning rate is proposed. This approach can be applied to design non-adaptive learning rate and adaptive learning rate. Firstly, I will introduce the…

Machine Learning · Computer Science 2022-07-07 Xin Cao

In this paper, we aim at providing an introduction to the gradient descent based optimization algorithms for learning deep neural network models. Deep learning models involving multiple nonlinear projection layers are very challenging to…

Machine Learning · Computer Science 2019-03-12 Jiawei Zhang

Tree-based models are widely recognized for their interpretability and have proven effective in various application domains, particularly in high-stakes domains. However, learning decision trees (DTs) poses a significant challenge due to…

Machine Learning · Computer Science 2026-03-13 Sascha Marton

Adaptive gradient methods, especially Adam-type methods (such as Adam, AMSGrad, and AdaBound), have been proposed to speed up the training process with an element-wise scaling term on learning rates. However, they often generalize poorly…

Machine Learning · Computer Science 2021-07-20 Zhou Shao , Tong Lin

As deep learning models and datasets rapidly scale up, network training is extremely time-consuming and resource-costly. Instead of training on the entire dataset, learning with a small synthetic dataset becomes an efficient solution.…

Machine Learning · Computer Science 2022-08-02 Zixuan Jiang , Jiaqi Gu , Mingjie Liu , David Z. Pan

We introduce a novel algorithm for gradient-based optimization of stochastic objective functions. The method may be seen as a variant of SGD with momentum equipped with an adaptive learning rate automatically adjusted by an 'energy'…

Optimization and Control · Mathematics 2022-03-24 Hailiang Liu , Xuping Tian

In this paper, we propose a new way to obtain optimal convergence rates for smooth stochastic (strong) convex optimization tasks. Our approach is based on results for optimization tasks where gradients have nonrandom noise. In contrast to…

Optimization and Control · Mathematics 2020-04-16 Darina Dvinskikh , Alexander Tyurin , Alexander Gasnikov , Sergey Omelchenko

Natural gradient descent is a principled method for adapting the parameters of a statistical model on-line using an underlying Riemannian parameter space to redefine the direction of steepest descent. The algorithm is examined via methods…

Disordered Systems and Neural Networks · Physics 2009-10-31 Magnus Rattray , David Saad

We present a novel per-dimension learning rate method for gradient descent called ADADELTA. The method dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient…

Machine Learning · Computer Science 2012-12-27 Matthew D. Zeiler

In this work, we propose an optimization algorithm which we call norm-adapted gradient descent. This algorithm is similar to other gradient-based optimization algorithms like Adam or Adagrad in that it adapts the learning rate of stochastic…

Machine Learning · Computer Science 2020-10-14 David Sprunger

Decision Trees (DTs) are commonly used for many machine learning tasks due to their high degree of interpretability. However, learning a DT from data is a difficult optimization problem, as it is non-convex and non-differentiable.…

Machine Learning · Computer Science 2024-08-20 Sascha Marton , Stefan Lüdtke , Christian Bartelt , Heiner Stuckenschmidt
‹ Prev 1 2 3 10 Next ›