English
Related papers

Related papers: Optimization Methods in Deep Learning: A Comprehen…

200 papers

Adaptive optimization methods, which perform local optimization with a metric constructed from the history of iterates, are becoming increasingly popular for training deep neural networks. Examples include AdaGrad, RMSProp, and Adam. We…

Machine Learning · Statistics 2018-05-23 Ashia C. Wilson , Rebecca Roelofs , Mitchell Stern , Nathan Srebro , Benjamin Recht

In this paper, we aim at providing an introduction to the gradient descent based optimization algorithms for learning deep neural network models. Deep learning models involving multiple nonlinear projection layers are very challenging to…

Machine Learning · Computer Science 2019-03-12 Jiawei Zhang

Neural network optimization remains one of the most consequential yet poorly understood challenges in modern AI research, where improvements in training algorithms can lead to enhanced feature learning in foundation models,…

Machine Learning · Computer Science 2025-12-23 Ansh Nagwekar

Balancing convergence speed, generalization capability, and computational efficiency remains a core challenge in deep learning optimization. First-order gradient descent methods, epitomized by stochastic gradient descent (SGD) and Adam,…

We introduce MADGRAD, a novel optimization method in the family of AdaGrad adaptive gradient methods. MADGRAD shows excellent performance on deep learning optimization problems from multiple fields, including classification and…

Machine Learning · Computer Science 2021-08-27 Aaron Defazio , Samy Jelassi

Stochastic optimization plays a crucial role in the advancement of deep learning technologies. Over the decades, significant effort has been dedicated to improving the training efficiency and robustness of deep neural networks, via various…

Machine Learning · Computer Science 2024-08-21 Huixiu Jiang , Ling Yang , Yu Bao , Rutong Si , Sikun Yang

Gradient descent based optimization methods are the methods of choice to train deep neural networks in machine learning. Beyond the standard gradient descent method, also suitable modified variants of standard gradient descent involving…

Optimization and Control · Mathematics 2025-04-29 Steffen Dereich , Arnulf Jentzen , Adrian Riekert

Adaptive optimization methods such as AdaGrad, RMSprop and Adam have been proposed to achieve a rapid training process with an element-wise scaling term on learning rates. Though prevailing, they are observed to generalize poorly compared…

Machine Learning · Computer Science 2019-04-22 Liangchen Luo , Yuanhao Xiong , Yan Liu , Xu Sun

In recent years, we have witnessed the rise of deep learning. Deep neural networks have proved their success in many areas. However, the optimization of these networks has become more difficult as neural networks going deeper and datasets…

Machine Learning · Computer Science 2020-08-05 Derya Soydaner

Several variants of stochastic gradient descent (SGD) have been proposed to improve the learning effectiveness and efficiency when training deep neural networks, among which some recent influential attempts would like to adaptively control…

Machine Learning · Computer Science 2020-10-22 Jie Liu , Chen Lin , Chuming Li , Lu Sheng , Ming Sun , Junjie Yan , Wanli Ouyang

Adaptive gradient optimization methods, such as Adam, are prevalent in training deep neural networks across diverse machine learning tasks due to their ability to achieve faster convergence. However, these methods often suffer from…

Machine Learning · Computer Science 2025-02-12 Abulikemu Abuduweili , Changliu Liu

Despite the omnipresent use of stochastic gradient descent (SGD) optimization methods in the training of deep neural networks (DNNs), it remains, in basically all practically relevant scenarios, a fundamental open problem to provide a…

Machine Learning · Computer Science 2025-03-04 Thang Do , Arnulf Jentzen , Adrian Riekert

Adaptive gradient-based optimizers such as Adagrad and Adam are crucial for achieving state-of-the-art performance in machine translation and language modeling. However, these methods maintain second-order statistics for each parameter,…

Machine Learning · Computer Science 2019-09-13 Rohan Anil , Vineet Gupta , Tomer Koren , Yoram Singer

Stochastic Gradient Decent (SGD) is one of the core techniques behind the success of deep neural networks. The gradient provides information on the direction in which a function has the steepest rate of change. The main problem with basic…

This paper deals with nonconvex stochastic optimization problems in deep learning and provides appropriate learning rates with which adaptive learning rate optimization algorithms, such as Adam and AMSGrad, can approximate a stationary…

Optimization and Control · Mathematics 2020-11-24 Hideaki Iiduka

This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural…

Machine Learning · Statistics 2018-02-12 Léon Bottou , Frank E. Curtis , Jorge Nocedal

Adaptive gradient methods are workhorses in deep learning. However, the convergence guarantees of adaptive gradient methods for nonconvex optimization have not been thoroughly studied. In this paper, we provide a fine-grained convergence…

Machine Learning · Computer Science 2024-06-21 Dongruo Zhou , Jinghui Chen , Yuan Cao , Ziyan Yang , Quanquan Gu

The success of deep learning can be attributed to various factors such as increase in computational power, large datasets, deep convolutional neural networks, optimizers etc. Particularly, the choice of optimizer affects the generalization,…

Machine Learning · Computer Science 2021-09-10 Anirudh Maiya , Inumella Sricharan , Anshuman Pandey , Srinivas K. S

The stochastic gradient descent (SGD) optimizers are generally used to train the convolutional neural networks (CNNs). In recent years, several adaptive momentum based SGD optimizers have been introduced, such as Adam, diffGrad, Radam and…

Computer Vision and Pattern Recognition · Computer Science 2022-10-14 Shiv Ram Dubey , Satish Kumar Singh , Bidyut Baran Chaudhuri

The goal of this tutorial is to introduce key models, algorithms, and open questions related to the use of optimization methods for solving problems arising in machine learning. It is written with an INFORMS audience in mind, specifically…

Machine Learning · Statistics 2017-07-03 Frank E. Curtis , Katya Scheinberg
‹ Prev 1 2 3 10 Next ›