Related papers: Stochastic Ratios Tracking Algorithm for Large Sca…
We propose a stochastic optimization method for minimizing loss functions, expressed as an expected value, that adaptively controls the batch size used in the computation of gradient approximations and the step size used to move along such…
Stochastic gradient descent (SGD) algorithm and its variations have been effectively used to optimize neural network models. However, with the rapid growth of big data and deep learning, SGD is no longer the most suitable choice due to its…
Stochastic gradient algorithms are the main focus of large-scale optimization problems and led to important successes in the recent advancement of the deep learning algorithms. The convergence of SGD depends on the careful choice of…
Stochastic gradient algorithms have been the main focus of large-scale learning problems and they led to important successes in machine learning. The convergence of SGD depends on the careful choice of learning rate and the amount of the…
In this paper, we consider a general stochastic optimization problem which is often at the core of supervised learning, such as deep learning and linear classification. We consider a standard stochastic gradient descent (SGD) method with a…
Stochastic gradient descent method and its variants constitute the core optimization algorithms that achieve good convergence rates for solving machine learning problems. These rates are obtained especially when these algorithms are…
In stochastic optimization, a common tool to deal sequentially with large sample is to consider the well-known stochastic gradient algorithm. Nevertheless, since the stepsequence is the same for each direction, this can lead to bad results…
The choice of step-size used in Stochastic Gradient Descent (SGD) optimization is empirically selected in most training procedures. Moreover, the use of scheduled learning techniques such as Step-Decaying, Cyclical-Learning, and Warmup to…
Stochastic gradient descent (SGD) holds as a classical method to build large scale machine learning models over big data. A stochastic gradient is typically calculated from a limited number of samples (known as mini-batch), so it…
Stochastic gradient descent (SGD) is the main approach for training deep networks: it moves towards the optimum of the cost function by iteratively updating the parameters of a model in the direction of the gradient of the loss evaluated on…
Stochastic Gradient Descent (SGD) has played a central role in machine learning. However, it requires a carefully hand-picked stepsize for fast convergence, which is notoriously tedious and time-consuming to tune. Over the last several…
Compressed Stochastic Gradient Descent (SGD) algorithms have been recently proposed to address the communication bottleneck in distributed and decentralized optimization problems, such as those that arise in federated machine learning.…
This paper considers a distributed stochastic non-convex optimization problem, where the nodes in a network cooperatively minimize a sum of $L$-smooth local cost functions with sparse gradients. By adaptively adjusting the stepsizes…
A framework previously introduced in [3] for solving a sequence of stochastic optimization problems with bounded changes in the minimizers is extended and applied to machine learning problems such as regression and classification. The…
Stochastic gradient descent (SGD) is a standard optimization method to minimize a training error with respect to network parameters in modern neural network learning. However, it typically suffers from proliferation of saddle points in the…
A framework is introduced for solving a sequence of slowly changing optimization problems, including those arising in regression and classification applications, using optimization algorithms such as stochastic gradient descent (SGD). The…
We propose a new per-layer adaptive step-size procedure for stochastic first-order optimization methods for minimizing empirical loss functions in deep learning, eliminating the need for the user to tune the learning rate (LR). The proposed…
The learning rate is an important tuning parameter for stochastic gradient descent (SGD) and can greatly influence its performance. However, appropriate selection of a learning rate schedule across all iterations typically requires a…
We introduce a novel and efficient algorithm called the stochastic approximate gradient descent (SAGD), as an alternative to the stochastic gradient descent for cases where unbiased stochastic gradients cannot be trivially obtained.…
Establishing a fast rate of convergence for optimization methods is crucial to their applicability in practice. With the increasing popularity of deep learning over the past decade, stochastic gradient descent and its adaptive variants…