Related papers: Asynchronous Stochastic Proximal Optimization Algo…

Asynchronous Stochastic Proximal Methods for Nonconvex Nonsmooth Optimization

We study stochastic algorithms for solving nonconvex optimization problems with a convex yet possibly nonsmooth regularizer, which find wide applications in many practical machine learning applications. However, compared to asynchronous…

Machine Learning · Computer Science 2018-09-18 Rui Zhu , Di Niu , Zongpeng Li

Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization

We provide the first theoretical analysis on the convergence rate of the asynchronous stochastic variance reduced gradient (SVRG) descent algorithm on non-convex optimization. Recent studies have shown that the asynchronous stochastic…

Machine Learning · Computer Science 2016-12-21 Zhouyuan Huo , Heng Huang

Decoupled Asynchronous Proximal Stochastic Gradient Descent with Variance Reduction

In the era of big data, optimizing large scale machine learning problems becomes a challenging task and draws significant attention. Asynchronous optimization algorithms come out as a promising solution. Recently, decoupled asynchronous…

Machine Learning · Computer Science 2016-09-30 Zhouyuan Huo , Bin Gu , Heng Huang

On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants

We study optimization algorithms based on variance reduction for stochastic gradient descent (SGD). Remarkable recent progress has been made in this direction through development of algorithms like SAG, SVRG, SAGA. These algorithms have…

Machine Learning · Computer Science 2016-01-26 Sashank J. Reddi , Ahmed Hefny , Suvrit Sra , Barnabás Póczos , Alex Smola

A Model Parallel Proximal Stochastic Gradient Algorithm for Partially Asynchronous Systems

Large models are prevalent in modern machine learning scenarios, including deep learning, recommender systems, etc., which can have millions or even billions of parameters. Parallel algorithms have become an essential solution technique to…

Machine Learning · Computer Science 2018-10-23 Rui Zhu , Di Niu

Make Workers Work Harder: Decoupled Asynchronous Proximal Stochastic Gradient Descent

Asynchronous parallel optimization algorithms for solving large-scale machine learning problems have drawn significant attention from academia to industry recently. This paper proposes a novel algorithm, decoupled asynchronous proximal…

Optimization and Control · Mathematics 2016-05-24 Yitan Li , Linli Xu , Xiaowei Zhong , Qing Ling

Linear Convergence of Variance-Reduced Stochastic Gradient without Strong Convexity

Stochastic gradient algorithms estimate the gradient based on only one or a few samples and enjoy low computational cost per iteration. They have been widely used in large-scale optimization problems. However, stochastic gradient algorithms…

Numerical Analysis · Computer Science 2015-07-13 Pinghua Gong , Jieping Ye

VR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning

In this paper, we propose a simple variant of the original SVRG, called variance reduced stochastic gradient descent (VR-SGD). Unlike the choices of snapshot and starting points in SVRG and its proximal variant, Prox-SVRG, the two vectors…

Machine Learning · Computer Science 2018-10-31 Fanhua Shang , Kaiwen Zhou , Hongying Liu , James Cheng , Ivor W. Tsang , Lijun Zhang , Dacheng Tao , Licheng Jiao

IS-ASGD: Accelerating Asynchronous SGD using Importance Sampling

Variance reduction (VR) techniques for convergence rate acceleration of stochastic gradient descent (SGD) algorithm have been developed with great efforts recently. VR's two variants, stochastic variance-reduced-gradient (SVRG-SGD) and…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-04-11 Fei Wang , Jun Ye , Weichen Li , Guihai Chen

Larger is Better: The Effect of Learning Rates Enjoyed by Stochastic Optimization with Progressive Variance Reduction

In this paper, we propose a simple variant of the original stochastic variance reduction gradient (SVRG), where hereafter we refer to as the variance reduced stochastic gradient descent (VR-SGD). Different from the choices of the snapshot…

Machine Learning · Computer Science 2017-04-18 Fanhua Shang

Fast Asynchronous Parallel Stochastic Gradient Decent

Stochastic gradient descent~(SGD) and its variants have become more and more popular in machine learning due to their efficiency and effectiveness. To handle large-scale problems, researchers have recently proposed several parallel SGD…

Machine Learning · Statistics 2015-08-25 Shen-Yi Zhao , Wu-Jun Li

Asynchronous Stochastic Composition Optimization with Variance Reduction

Composition optimization has drawn a lot of attention in a wide variety of machine learning domains from risk management to reinforcement learning. Existing methods solving the composition optimization problem often work in a sequential and…

Optimization and Control · Mathematics 2018-11-16 Shuheng Shen , Linli Xu , Jingchang Liu , Junliang Guo , Qing Ling

Asynchronous Optimization Methods for Efficient Training of Deep Neural Networks with Guarantees

Asynchronous distributed algorithms are a popular way to reduce synchronization costs in large-scale optimization, and in particular for neural network training. However, for nonsmooth and nonconvex objectives, few convergence guarantees…

Optimization and Control · Mathematics 2020-07-14 Vyacheslav Kungurtsev , Malcolm Egan , Bapi Chatterjee , Dan Alistarh

Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization

Asynchronous parallel implementations of stochastic gradient (SG) have been broadly used in solving deep neural network and received many successes in practice recently. However, existing theories cannot explain their convergence and…

Optimization and Control · Mathematics 2019-04-22 Xiangru Lian , Yijun Huang , Yuncheng Li , Ji Liu

A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization

We analyze stochastic gradient algorithms for optimizing nonconvex, nonsmooth finite-sum problems. In particular, the objective function is given by the summation of a differentiable (possibly nonconvex) component, together with a possibly…

Optimization and Control · Mathematics 2018-12-04 Zhize Li , Jian Li

S-D-RSM: Stochastic Distributed Regularized Splitting Method for Large-Scale Convex Optimization Problems

This paper investigates the problems large-scale distributed composite convex optimization, with motivations from a broad range of applications, including multi-agent systems, federated learning, smart grids, wireless sensor networks,…

Optimization and Control · Mathematics 2025-12-16 Maoran Wang , Xingju Cai , Yongxin Chen

Computational Complexity of Sub-Linear Convergent Algorithms

Optimizing machine learning algorithms that are used to solve the objective function has been of great interest. Several approaches to optimize common algorithms, such as gradient descent and stochastic gradient descent, were explored. One…

Machine Learning · Computer Science 2022-10-06 Hilal AlQuabeh , Farha AlBreiki , Dilshod Azizov

Generalization Error Bounds for Optimization Algorithms via Stability

Many machine learning tasks can be formulated as Regularized Empirical Risk Minimization (R-ERM), and solved by optimization algorithms such as gradient descent (GD), stochastic gradient descent (SGD), and stochastic variance reduction…

Machine Learning · Statistics 2016-09-28 Qi Meng , Yue Wang , Wei Chen , Taifeng Wang , Zhi-Ming Ma , Tie-Yan Liu

Differential Equations for Modeling Asynchronous Algorithms

Asynchronous stochastic gradient descent (ASGD) is a popular parallel optimization algorithm in machine learning. Most theoretical analysis on ASGD take a discrete view and prove upper bounds for their convergence rates. However, the…

Machine Learning · Statistics 2018-05-09 Li He , Qi Meng , Wei Chen , Zhi-Ming Ma , Tie-Yan Liu

A Proximal Stochastic Gradient Method with Progressive Variance Reduction

We consider the problem of minimizing the sum of two convex functions: one is the average of a large number of smooth component functions, and the other is a general convex function that admits a simple proximal mapping. We assume the whole…

Optimization and Control · Mathematics 2014-03-20 Lin Xiao , Tong Zhang