English
Related papers

Related papers: Do optimization methods in deep learning applicati…

200 papers

Balancing convergence speed, generalization capability, and computational efficiency remains a core challenge in deep learning optimization. First-order gradient descent methods, epitomized by stochastic gradient descent (SGD) and Adam,…

Neural network optimization remains one of the most consequential yet poorly understood challenges in modern AI research, where improvements in training algorithms can lead to enhanced feature learning in foundation models,…

Machine Learning · Computer Science 2025-12-23 Ansh Nagwekar

In recent years, we have witnessed the rise of deep learning. Deep neural networks have proved their success in many areas. However, the optimization of these networks has become more difficult as neural networks going deeper and datasets…

Machine Learning · Computer Science 2020-08-05 Derya Soydaner

This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural…

Machine Learning · Statistics 2018-02-12 Léon Bottou , Frank E. Curtis , Jorge Nocedal

Stochastic gradient-based descent (SGD), have long been central to training large language models (LLMs). However, their effectiveness is increasingly being questioned, particularly in large-scale applications where empirical evidence…

Machine Learning · Computer Science 2025-07-03 Di Zhang , Yihang Zhang

In recent years, deep learning has achieved remarkable success in various fields such as image recognition, natural language processing, and speech recognition. The effectiveness of deep learning largely depends on the optimization methods…

Machine Learning · Computer Science 2023-04-25 David Shulman

We address the application of stochastic optimization methods for the simultaneous control of parameter-dependent systems. In particular, we focus on the classical Stochastic Gradient Descent (SGD) approach of Robbins and Monro, and on the…

Optimization and Control · Mathematics 2023-02-08 Umberto Biccari , Ana Navarro-Quiles , Enrique Zuazua

Gradient Descent (GD) and Conjugate Gradient (CG) methods are among the most effective iterative algorithms for solving unconstrained optimization problems, particularly in machine learning and statistical modeling, where they are employed…

Optimization and Control · Mathematics 2024-12-19 Xianqi Jiao , Jia Liu , Zhiping Chen

Focusing on the practical configuration needs of optimization algorithms in deep learning, this article concentrates on five major algorithms: SGD, Mini-batch SGD, Momentum, Adam, and Lion. It systematically analyzes the core advantages,…

Machine Learning · Computer Science 2025-11-27 Deng Fucheng , Wang Wanjie , Gong Ao , Wang Xiaoqi , Wang Fan

In this paper, we consider a general stochastic optimization problem which is often at the core of supervised learning, such as deep learning and linear classification. We consider a standard stochastic gradient descent (SGD) method with a…

Machine Learning · Statistics 2018-12-27 Lam M. Nguyen , Nam H. Nguyen , Dzung T. Phan , Jayant R. Kalagnanam , Katya Scheinberg

Stochastic Gradient Descent (SGD) methods see many uses in optimization problems. Modifications to the algorithm, such as momentum-based SGD methods have been known to produce better results in certain cases. Much of this, however, is due…

Machine Learning · Computer Science 2025-04-22 Eric Lu

State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD). Recently, many variations have been explored: perturbing parameters for better accuracy (such as in Extragradient), limiting SGD…

Machine Learning · Computer Science 2022-03-23 Amirkeivan Mohtashami , Martin Jaggi , Sebastian U. Stich

Stochastic Gradient Descent (SGD), a widely used optimization algorithm in deep learning, is often limited to converging to local optima due to the non-convex nature of the problem. Leveraging these local optima to improve model performance…

Machine Learning · Computer Science 2023-09-22 Hao Chen , Yusen Wu , Phuong Nguyen , Chao Liu , Yelena Yesha

A number of results have recently demonstrated the benefits of incorporating various constraints when training deep architectures in vision and machine learning. The advantages range from guarantees for statistical generalization to better…

Machine Learning · Computer Science 2019-05-27 Sathya N. Ravi , Tuan Dinh , Vishnu Lokhande , Vikas Singh

This study examines several renowned gradient-based optimization techniques and focuses on their computational efficiency and precision. In the study, the steepest descent, conjugate gradient (Fletcher-Reeves and Polak-Ribiere variants),…

A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space. In this work, we unlock a gradient-based…

Machine Learning · Statistics 2025-03-19 Logan Engstrom , Andrew Ilyas , Benjamin Chen , Axel Feldmann , William Moses , Aleksander Madry

Adaptive optimization methods, which perform local optimization with a metric constructed from the history of iterates, are becoming increasingly popular for training deep neural networks. Examples include AdaGrad, RMSProp, and Adam. We…

Machine Learning · Statistics 2018-05-23 Ashia C. Wilson , Rebecca Roelofs , Mitchell Stern , Nathan Srebro , Benjamin Recht

Large-scale constrained optimization problems are at the core of many tasks in control, signal processing, and machine learning. Notably, problems with functional constraints arise when, beyond a performance{\nobreakdash-}centric goal…

Optimization and Control · Mathematics 2025-05-15 Antesh Upadhyay , Sang Bin Moon , Abolfazl Hashemi

Training deep neural networks is a highly nontrivial task, involving carefully selecting appropriate training algorithms, scheduling step sizes and tuning other hyperparameters. Trying different combinations can be quite labor-intensive and…

Machine Learning · Computer Science 2017-06-13 Kaifeng Lv , Shunhua Jiang , Jian Li

Stochastic gradient descent (\textsc{Sgd}) methods are the most powerful optimization tools in training machine learning and deep learning models. Moreover, acceleration (a.k.a. momentum) methods and diagonal scaling (a.k.a. adaptive…

Machine Learning · Statistics 2018-10-02 Qi Deng , Yi Cheng , Guanghui Lan
‹ Prev 1 2 3 10 Next ›