Related papers: Do optimization methods in deep learning applicati…
Balancing convergence speed, generalization capability, and computational efficiency remains a core challenge in deep learning optimization. First-order gradient descent methods, epitomized by stochastic gradient descent (SGD) and Adam,…
Neural network optimization remains one of the most consequential yet poorly understood challenges in modern AI research, where improvements in training algorithms can lead to enhanced feature learning in foundation models,…
In recent years, we have witnessed the rise of deep learning. Deep neural networks have proved their success in many areas. However, the optimization of these networks has become more difficult as neural networks going deeper and datasets…
This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural…
Stochastic gradient-based descent (SGD), have long been central to training large language models (LLMs). However, their effectiveness is increasingly being questioned, particularly in large-scale applications where empirical evidence…
In recent years, deep learning has achieved remarkable success in various fields such as image recognition, natural language processing, and speech recognition. The effectiveness of deep learning largely depends on the optimization methods…
We address the application of stochastic optimization methods for the simultaneous control of parameter-dependent systems. In particular, we focus on the classical Stochastic Gradient Descent (SGD) approach of Robbins and Monro, and on the…
Gradient Descent (GD) and Conjugate Gradient (CG) methods are among the most effective iterative algorithms for solving unconstrained optimization problems, particularly in machine learning and statistical modeling, where they are employed…
Focusing on the practical configuration needs of optimization algorithms in deep learning, this article concentrates on five major algorithms: SGD, Mini-batch SGD, Momentum, Adam, and Lion. It systematically analyzes the core advantages,…
In this paper, we consider a general stochastic optimization problem which is often at the core of supervised learning, such as deep learning and linear classification. We consider a standard stochastic gradient descent (SGD) method with a…
Stochastic Gradient Descent (SGD) methods see many uses in optimization problems. Modifications to the algorithm, such as momentum-based SGD methods have been known to produce better results in certain cases. Much of this, however, is due…
State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD). Recently, many variations have been explored: perturbing parameters for better accuracy (such as in Extragradient), limiting SGD…
Stochastic Gradient Descent (SGD), a widely used optimization algorithm in deep learning, is often limited to converging to local optima due to the non-convex nature of the problem. Leveraging these local optima to improve model performance…
A number of results have recently demonstrated the benefits of incorporating various constraints when training deep architectures in vision and machine learning. The advantages range from guarantees for statistical generalization to better…
This study examines several renowned gradient-based optimization techniques and focuses on their computational efficiency and precision. In the study, the steepest descent, conjugate gradient (Fletcher-Reeves and Polak-Ribiere variants),…
A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space. In this work, we unlock a gradient-based…
Adaptive optimization methods, which perform local optimization with a metric constructed from the history of iterates, are becoming increasingly popular for training deep neural networks. Examples include AdaGrad, RMSProp, and Adam. We…
Large-scale constrained optimization problems are at the core of many tasks in control, signal processing, and machine learning. Notably, problems with functional constraints arise when, beyond a performance{\nobreakdash-}centric goal…
Training deep neural networks is a highly nontrivial task, involving carefully selecting appropriate training algorithms, scheduling step sizes and tuning other hyperparameters. Trying different combinations can be quite labor-intensive and…
Stochastic gradient descent (\textsc{Sgd}) methods are the most powerful optimization tools in training machine learning and deep learning models. Moreover, acceleration (a.k.a. momentum) methods and diagonal scaling (a.k.a. adaptive…