Related papers: Do optimization methods in deep learning applicati…

Evolution of Optimization Methods: Algorithms, Scenarios, and Evaluations

Balancing convergence speed, generalization capability, and computational efficiency remains a core challenge in deep learning optimization. First-order gradient descent methods, epitomized by stochastic gradient descent (SGD) and Adam,…

Machine Learning · Computer Science 2026-04-15 Tong Zhang , Jiangning Zhang , Zhucun Xue , Juntao Jiang , Yicheng Xu , Chengming Xu , Teng Hu , Xingyu Xie , Xiaobin Hu , Yabiao Wang , Yong Liu , Shuicheng Yan

Towards Guided Descent: Optimization Algorithms for Training Neural Networks At Scale

Neural network optimization remains one of the most consequential yet poorly understood challenges in modern AI research, where improvements in training algorithms can lead to enhanced feature learning in foundation models,…

Machine Learning · Computer Science 2025-12-23 Ansh Nagwekar

A Comparison of Optimization Algorithms for Deep Learning

In recent years, we have witnessed the rise of deep learning. Deep neural networks have proved their success in many areas. However, the optimization of these networks has become more difficult as neural networks going deeper and datasets…

Machine Learning · Computer Science 2020-08-05 Derya Soydaner

Optimization Methods for Large-Scale Machine Learning

This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural…

Machine Learning · Statistics 2018-02-12 Léon Bottou , Frank E. Curtis , Jorge Nocedal

Beyond First-Order: Training LLMs with Stochastic Conjugate Subgradients and AdamW

Stochastic gradient-based descent (SGD), have long been central to training large language models (LLMs). However, their effectiveness is increasingly being questioned, particularly in large-scale applications where empirical evidence…

Machine Learning · Computer Science 2025-07-03 Di Zhang , Yihang Zhang

Optimization Methods in Deep Learning: A Comprehensive Overview

In recent years, deep learning has achieved remarkable success in various fields such as image recognition, natural language processing, and speech recognition. The effectiveness of deep learning largely depends on the optimization methods…

Machine Learning · Computer Science 2023-04-25 David Shulman

Stochastic optimization methods for the simultaneous control of parameter-dependent systems

We address the application of stochastic optimization methods for the simultaneous control of parameter-dependent systems. In particular, we focus on the classical Stochastic Gradient Descent (SGD) approach of Robbins and Monro, and on the…

Optimization and Control · Mathematics 2023-02-08 Umberto Biccari , Ana Navarro-Quiles , Enrique Zuazua

Learning complexity of gradient descent and conjugate gradient algorithms

Gradient Descent (GD) and Conjugate Gradient (CG) methods are among the most effective iterative algorithms for solving unconstrained optimization problems, particularly in machine learning and statistical modeling, where they are employed…

Optimization and Control · Mathematics 2024-12-19 Xianqi Jiao , Jia Liu , Zhiping Chen

Gradient Descent Algorithm Survey

Focusing on the practical configuration needs of optimization algorithms in deep learning, this article concentrates on five major algorithms: SGD, Mini-batch SGD, Momentum, Adam, and Lion. It systematically analyzes the core advantages,…

Machine Learning · Computer Science 2025-11-27 Deng Fucheng , Wang Wanjie , Gong Ao , Wang Xiaoqi , Wang Fan

When Does Stochastic Gradient Algorithm Work Well?

In this paper, we consider a general stochastic optimization problem which is often at the core of supervised learning, such as deep learning and linear classification. We consider a standard stochastic gradient descent (SGD) method with a…

Machine Learning · Statistics 2018-12-27 Lam M. Nguyen , Nam H. Nguyen , Dzung T. Phan , Jayant R. Kalagnanam , Katya Scheinberg

First and Second Order Approximations to Stochastic Gradient Descent Methods with Momentum Terms

Stochastic Gradient Descent (SGD) methods see many uses in optimization problems. Modifications to the algorithm, such as momentum-based SGD methods have been known to produce better results in certain cases. Much of this, however, is due…

Machine Learning · Computer Science 2025-04-22 Eric Lu

Masked Training of Neural Networks with Partial Gradients

State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD). Recently, many variations have been explored: perturbing parameters for better accuracy (such as in Extragradient), limiting SGD…

Machine Learning · Computer Science 2022-03-23 Amirkeivan Mohtashami , Martin Jaggi , Sebastian U. Stich

Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance

Stochastic Gradient Descent (SGD), a widely used optimization algorithm in deep learning, is often limited to converging to local optima due to the non-convex nature of the problem. Leveraging these local optima to improve model performance…

Machine Learning · Computer Science 2023-09-22 Hao Chen , Yusen Wu , Phuong Nguyen , Chao Liu , Yelena Yesha

Constrained Deep Learning using Conditional Gradient and Applications in Computer Vision

A number of results have recently demonstrated the benefits of incorporating various constraints when training deep architectures in vision and machine learning. The advantages range from guarantees for statistical generalization to better…

Machine Learning · Computer Science 2019-05-27 Sathya N. Ravi , Tuan Dinh , Vishnu Lokhande , Vikas Singh

Comparative Analysis of Gradient-Based Optimization Techniques Using Multidimensional Surface 3D Visualizations and Initial Point Sensitivity

This study examines several renowned gradient-based optimization techniques and focuses on their computational efficiency and precision. In the study, the steepest descent, conjugate gradient (Fletcher-Reeves and Polak-Ribiere variants),…

Optimization and Control · Mathematics 2025-05-12 Saeed Asadi , Sonia Gharibzadeh , Hajar Kazemi Naeini , Masoud Reihanifar , Morteza Rahimi , Shiva Zangeneh , Aseel Smerat , Lazim Abdullah

Optimizing ML Training with Metagradient Descent

A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space. In this work, we unlock a gradient-based…

Machine Learning · Statistics 2025-03-19 Logan Engstrom , Andrew Ilyas , Benjamin Chen , Axel Feldmann , William Moses , Aleksander Madry

The Marginal Value of Adaptive Gradient Methods in Machine Learning

Adaptive optimization methods, which perform local optimization with a metric constructed from the history of iterates, are becoming increasingly popular for training deep neural networks. Examples include AdaGrad, RMSProp, and Adam. We…

Machine Learning · Statistics 2018-05-23 Ashia C. Wilson , Rebecca Roelofs , Mitchell Stern , Nathan Srebro , Benjamin Recht

Optimization via First-Order Switching Methods: Skew-Symmetric Dynamics and Optimistic Discretization

Large-scale constrained optimization problems are at the core of many tasks in control, signal processing, and machine learning. Notably, problems with functional constraints arise when, beyond a performance{\nobreakdash-}centric goal…

Optimization and Control · Mathematics 2025-05-15 Antesh Upadhyay , Sang Bin Moon , Abolfazl Hashemi

Learning Gradient Descent: Better Generalization and Longer Horizons

Training deep neural networks is a highly nontrivial task, involving carefully selecting appropriate training algorithms, scheduling step sizes and tuning other hyperparameters. Trying different combinations can be quite labor-intensive and…

Machine Learning · Computer Science 2017-06-13 Kaifeng Lv , Shunhua Jiang , Jian Li

Optimal Adaptive and Accelerated Stochastic Gradient Descent

Stochastic gradient descent (\textsc{Sgd}) methods are the most powerful optimization tools in training machine learning and deep learning models. Moreover, acceleration (a.k.a. momentum) methods and diagonal scaling (a.k.a. adaptive…

Machine Learning · Statistics 2018-10-02 Qi Deng , Yi Cheng , Guanghui Lan