English
Related papers

Related papers: Stochastic Gradient Variance Reduction by Solving …

200 papers

Stochastic gradient descent (SGD) algorithm and its variations have been effectively used to optimize neural network models. However, with the rapid growth of big data and deep learning, SGD is no longer the most suitable choice due to its…

Machine Learning · Computer Science 2024-02-13 Anuraganand Sharma

Stochastic Gradient Decent (SGD) is one of the core techniques behind the success of deep neural networks. The gradient provides information on the direction in which a function has the steepest rate of change. The main problem with basic…

Stochastic Gradient Descent (SGD) and its momentum variants form the backbone of deep learning optimization, yet the underlying dynamics of their gradient behavior remain insufficiently understood. In this work, we reinterpret gradient…

Machine Learning · Computer Science 2026-03-09 Zhipeng Yao , Rui Yu , Guisong Chang , Ying Li , Yu Zhang , Dazhou Li

Stochastic Gradient Descent (SGD) is the workhorse algorithm of deep learning technology. At each step of the training phase, a mini batch of samples is drawn from the training dataset and the weights of the neural network are adjusted…

Disordered Systems and Neural Networks · Physics 2022-09-07 Francesca Mignacco , Pierfrancesco Urbani

Stochastic gradient descent (SGD), which dates back to the 1950s, is one of the most popular and effective approaches for performing stochastic optimization. Research on SGD resurged recently in machine learning for optimizing convex loss…

Machine Learning · Computer Science 2019-12-24 Jie Chen , Ronny Luss

Stochastic gradient descent (SGD) is a standard optimization method to minimize a training error with respect to network parameters in modern neural network learning. However, it typically suffers from proliferation of saddle points in the…

Machine Learning · Computer Science 2017-11-23 Haiping Huang , Taro Toyoizumi

Deep neural networks have been shown to achieve state-of-the-art performance in several machine learning tasks. Stochastic Gradient Descent (SGD) is the preferred optimization algorithm for training these networks and asynchronous SGD…

Machine Learning · Computer Science 2016-04-06 Wei Zhang , Suyog Gupta , Xiangru Lian , Ji Liu

In this paper, we consider a general stochastic optimization problem which is often at the core of supervised learning, such as deep learning and linear classification. We consider a standard stochastic gradient descent (SGD) method with a…

Machine Learning · Statistics 2018-12-27 Lam M. Nguyen , Nam H. Nguyen , Dzung T. Phan , Jayant R. Kalagnanam , Katya Scheinberg

Momentum plays a crucial role in stochastic gradient-based optimization algorithms for accelerating or improving training deep neural networks (DNNs). In deep learning practice, the momentum is usually weighted by a well-calibrated…

Machine Learning · Computer Science 2020-12-04 Bao Wang , Qiang Ye

Stochastic gradient descent (SGD) is one of the most popular algorithms in modern machine learning. The noise encountered in these applications is different from that in many theoretical analyses of stochastic gradient algorithms. In this…

Machine Learning · Statistics 2021-09-16 Stephan Wojtowytsch

Stochastic gradient descent (SGD) still is the workhorse for many practical problems. However, it converges slow, and can be difficult to tune. It is possible to precondition SGD to accelerate its convergence remarkably. But many attempts…

Machine Learning · Statistics 2017-02-23 Xi-Lin Li

Stochastic gradient algorithms are the main focus of large-scale optimization problems and led to important successes in the recent advancement of the deep learning algorithms. The convergence of SGD depends on the careful choice of…

Machine Learning · Computer Science 2017-03-03 Caglar Gulcehre , Jose Sotelo , Marcin Moczulski , Yoshua Bengio

In deep learning, stochastic gradient descent (SGD) and its momentum-based variants are widely used for optimization. However, the internal dynamics of these methods remain underexplored. In this paper, we analyze gradient behavior through…

Machine Learning · Computer Science 2025-03-11 Zhipeng Yao , Rui Yu , Guisong Chang , Ying Li , Yu Zhang , Dazhou Li

Deep neural networks (DNNs) have achieved remarkable success in computer vision; however, training DNNs for satisfactory performance remains challenging and suffers from sensitivity to empirical selections of an optimization algorithm for…

Computer Vision and Pattern Recognition · Computer Science 2020-12-22 Haichao Zhang , Kuangrong Hao , Lei Gao , Bing Wei , Xuesong Tang

Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(\log(T)/T), by running SGD for…

Machine Learning · Computer Science 2015-03-19 Alexander Rakhlin , Ohad Shamir , Karthik Sridharan

Neural network optimization remains one of the most consequential yet poorly understood challenges in modern AI research, where improvements in training algorithms can lead to enhanced feature learning in foundation models,…

Machine Learning · Computer Science 2025-12-23 Ansh Nagwekar

Stochastic gradient descent (SGD) has been a go-to algorithm for nonconvex stochastic optimization problems arising in machine learning. Its theory however often requires a strong framework to guarantee convergence properties. We hereby…

Optimization and Control · Mathematics 2025-03-11 Azar Louzi

Stochastic gradient descent (SGD) is the main approach for training deep networks: it moves towards the optimum of the cost function by iteratively updating the parameters of a model in the direction of the gradient of the loss evaluated on…

Machine Learning · Computer Science 2021-03-30 Loris Nanni , Gianluca Maguolo , Alessandra Lumini

The performance of gradient-based optimization methods, such as standard gradient descent (GD), greatly depends on the choice of learning rate. However, it can require a non-trivial amount of user tuning effort to select an appropriate…

Machine Learning · Computer Science 2025-10-14 Nikola Surjanovic , Alexandre Bouchard-Côté , Trevor Campbell

Stochastic Gradient Descent (SGD) is the most popular algorithm for training deep neural networks (DNNs). As larger networks and datasets cause longer training times, training on distributed systems is common and distributed SGD variants,…

Machine Learning · Computer Science 2019-06-17 Kwangmin Yu , Thomas Flynn , Shinjae Yoo , Nicholas D'Imperio
‹ Prev 1 2 3 10 Next ›