Related papers: Asynchronous Stochastic Proximal Optimization Algo…
We study stochastic algorithms for solving nonconvex optimization problems with a convex yet possibly nonsmooth regularizer, which find wide applications in many practical machine learning applications. However, compared to asynchronous…
We provide the first theoretical analysis on the convergence rate of the asynchronous stochastic variance reduced gradient (SVRG) descent algorithm on non-convex optimization. Recent studies have shown that the asynchronous stochastic…
In the era of big data, optimizing large scale machine learning problems becomes a challenging task and draws significant attention. Asynchronous optimization algorithms come out as a promising solution. Recently, decoupled asynchronous…
We study optimization algorithms based on variance reduction for stochastic gradient descent (SGD). Remarkable recent progress has been made in this direction through development of algorithms like SAG, SVRG, SAGA. These algorithms have…
Large models are prevalent in modern machine learning scenarios, including deep learning, recommender systems, etc., which can have millions or even billions of parameters. Parallel algorithms have become an essential solution technique to…
Asynchronous parallel optimization algorithms for solving large-scale machine learning problems have drawn significant attention from academia to industry recently. This paper proposes a novel algorithm, decoupled asynchronous proximal…
Stochastic gradient algorithms estimate the gradient based on only one or a few samples and enjoy low computational cost per iteration. They have been widely used in large-scale optimization problems. However, stochastic gradient algorithms…
In this paper, we propose a simple variant of the original SVRG, called variance reduced stochastic gradient descent (VR-SGD). Unlike the choices of snapshot and starting points in SVRG and its proximal variant, Prox-SVRG, the two vectors…
Variance reduction (VR) techniques for convergence rate acceleration of stochastic gradient descent (SGD) algorithm have been developed with great efforts recently. VR's two variants, stochastic variance-reduced-gradient (SVRG-SGD) and…
In this paper, we propose a simple variant of the original stochastic variance reduction gradient (SVRG), where hereafter we refer to as the variance reduced stochastic gradient descent (VR-SGD). Different from the choices of the snapshot…
Stochastic gradient descent~(SGD) and its variants have become more and more popular in machine learning due to their efficiency and effectiveness. To handle large-scale problems, researchers have recently proposed several parallel SGD…
Composition optimization has drawn a lot of attention in a wide variety of machine learning domains from risk management to reinforcement learning. Existing methods solving the composition optimization problem often work in a sequential and…
Asynchronous distributed algorithms are a popular way to reduce synchronization costs in large-scale optimization, and in particular for neural network training. However, for nonsmooth and nonconvex objectives, few convergence guarantees…
Asynchronous parallel implementations of stochastic gradient (SG) have been broadly used in solving deep neural network and received many successes in practice recently. However, existing theories cannot explain their convergence and…
We analyze stochastic gradient algorithms for optimizing nonconvex, nonsmooth finite-sum problems. In particular, the objective function is given by the summation of a differentiable (possibly nonconvex) component, together with a possibly…
This paper investigates the problems large-scale distributed composite convex optimization, with motivations from a broad range of applications, including multi-agent systems, federated learning, smart grids, wireless sensor networks,…
Optimizing machine learning algorithms that are used to solve the objective function has been of great interest. Several approaches to optimize common algorithms, such as gradient descent and stochastic gradient descent, were explored. One…
Many machine learning tasks can be formulated as Regularized Empirical Risk Minimization (R-ERM), and solved by optimization algorithms such as gradient descent (GD), stochastic gradient descent (SGD), and stochastic variance reduction…
Asynchronous stochastic gradient descent (ASGD) is a popular parallel optimization algorithm in machine learning. Most theoretical analysis on ASGD take a discrete view and prove upper bounds for their convergence rates. However, the…
We consider the problem of minimizing the sum of two convex functions: one is the average of a large number of smooth component functions, and the other is a general convex function that admits a simple proximal mapping. We assume the whole…