English
Related papers

Related papers: Reparametrizing gradient descent

200 papers

Any gradient descent optimization requires to choose a learning rate. With deeper and deeper models, tuning that learning rate can easily become tedious and does not necessarily lead to an ideal convergence. We propose a variation of the…

Machine Learning · Statistics 2018-04-10 Mathieu Ravaut , Satya Gorti

In this paper we propose several adaptive gradient methods for stochastic optimization. Unlike AdaGrad-type of methods, our algorithms are based on Armijo-type line search and they simultaneously adapt to the unknown Lipschitz constant of…

We introduce a general method for improving the convergence rate of gradient-based optimizers that is easy to implement and works well in practice. We demonstrate the effectiveness of the method in a range of optimization problems by…

Machine Learning · Computer Science 2018-08-23 Atilim Gunes Baydin , Robert Cornish , David Martinez Rubio , Mark Schmidt , Frank Wood

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has…

Machine Learning · Computer Science 2017-01-31 Diederik P. Kingma , Jimmy Ba

In the recent years, various gradient descent algorithms including the methods of gradient descent, gradient descent with momentum, adaptive gradient (AdaGrad), root-mean-square propagation (RMSProp) and adaptive moment estimation (Adam)…

Machine Learning · Computer Science 2024-09-19 Abel C. H. Chen

In this paper, we aim at providing an introduction to the gradient descent based optimization algorithms for learning deep neural network models. Deep learning models involving multiple nonlinear projection layers are very challenging to…

Machine Learning · Computer Science 2019-03-12 Jiawei Zhang

In stochastic optimization, a common tool to deal sequentially with large sample is to consider the well-known stochastic gradient algorithm. Nevertheless, since the stepsequence is the same for each direction, this can lead to bad results…

Optimization and Control · Mathematics 2023-03-03 Antoine Godichon-Baggioni , Pierre Tarrago

The stochastic gradient descent (SGD) optimizers are generally used to train the convolutional neural networks (CNNs). In recent years, several adaptive momentum based SGD optimizers have been introduced, such as Adam, diffGrad, Radam and…

Computer Vision and Pattern Recognition · Computer Science 2022-10-14 Shiv Ram Dubey , Satish Kumar Singh , Bidyut Baran Chaudhuri

Adaptive gradient methods, which adopt historical gradient information to automatically adjust the learning rate, despite the nice property of fast convergence, have been observed to generalize worse than stochastic gradient descent (SGD)…

Machine Learning · Computer Science 2020-06-24 Jinghui Chen , Dongruo Zhou , Yiqi Tang , Ziyan Yang , Yuan Cao , Quanquan Gu

Feedback alignment algorithms are an alternative to backpropagation to train neural networks, whereby some of the partial derivatives that are required to compute the gradient are replaced by random terms. This essentially transforms the…

Machine Learning · Computer Science 2023-06-06 Dominique Chu , Florian Bacho

We present a first-order method for solving constrained optimization problems. The method is derived from our previous work, a modified search direction method inspired by singular value decomposition. In this work, we simplify its…

Optimization and Control · Mathematics 2023-02-24 Long Chen , Kai-Uwe Bletzinger , Nicolas R. Gauger , Yinyu Ye

Adaptive gradient methods such as AdaGrad and its variants update the stepsize in stochastic gradient descent on the fly according to the gradients received along the way; such methods have gained widespread use in large-scale optimization…

Machine Learning · Statistics 2021-04-20 Rachel Ward , Xiaoxia Wu , Leon Bottou

Gradient descent based optimization methods are the methods of choice to train deep neural networks in machine learning. Beyond the standard gradient descent method, also suitable modified variants of standard gradient descent involving…

Optimization and Control · Mathematics 2025-04-29 Steffen Dereich , Arnulf Jentzen , Adrian Riekert

In this paper, we provide a rigorous proof of convergence of the Adaptive Moment Estimate (Adam) algorithm for a wide class of optimization objectives. Despite the popularity and efficiency of the Adam algorithm in training deep neural…

Optimization and Control · Mathematics 2023-11-08 Haochuan Li , Alexander Rakhlin , Ali Jadbabaie

Adaptive optimization methods, which perform local optimization with a metric constructed from the history of iterates, are becoming increasingly popular for training deep neural networks. Examples include AdaGrad, RMSProp, and Adam. We…

Machine Learning · Statistics 2018-05-23 Ashia C. Wilson , Rebecca Roelofs , Mitchell Stern , Nathan Srebro , Benjamin Recht

Stochastic gradient descent (SGD) is the main approach for training deep networks: it moves towards the optimum of the cost function by iteratively updating the parameters of a model in the direction of the gradient of the loss evaluated on…

Machine Learning · Computer Science 2021-03-30 Loris Nanni , Gianluca Maguolo , Alessandra Lumini

It is well known that we need to choose the hyper-parameters in Momentum, AdaGrad, AdaDelta, and other alternative stochastic optimizers. While in many cases, the hyper-parameters are tuned tediously based on experience becoming more of an…

Machine Learning · Computer Science 2022-04-05 Jun Lu

One of the most important parts of Artificial Neural Networks is minimizing the loss functions which tells us how good or bad our model is. To minimize these losses we need to tune the weights and biases. Also to calculate the minimum value…

Machine Learning · Computer Science 2021-01-08 Kaustubh Yadav

Adaptive gradient methods like AdaGrad are widely used in optimizing neural networks. Yet, existing convergence guarantees for adaptive gradient methods require either convexity or smoothness, and, in the smooth setting, only guarantee…

Machine Learning · Computer Science 2019-10-22 Xiaoxia Wu , Simon S. Du , Rachel Ward

Optimization problems in disciplines such as machine learning are commonly solved with iterative methods. Gradient descent algorithms find local minima by moving along the direction of steepest descent while Newton's method takes into…

Quantum Physics · Physics 2018-08-20 Patrick Rebentrost , Maria Schuld , Leonard Wossnig , Francesco Petruccione , Seth Lloyd
‹ Prev 1 2 3 10 Next ›