Related papers: Asynchronous Stochastic Composition Optimization w…

Asynchronous Stochastic Block Coordinate Descent with Variance Reduction

Asynchronous parallel implementations for stochastic optimization have received huge successes in theory and practice recently. Asynchronous implementations with lock-free are more efficient than the one with writing or reading lock. In…

Machine Learning · Computer Science 2016-11-15 Bin Gu , Zhouyuan Huo , Heng Huang

Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization

We provide the first theoretical analysis on the convergence rate of the asynchronous stochastic variance reduced gradient (SVRG) descent algorithm on non-convex optimization. Recent studies have shown that the asynchronous stochastic…

Machine Learning · Computer Science 2016-12-21 Zhouyuan Huo , Heng Huang

Fast Asynchronous Parallel Stochastic Gradient Decent

Stochastic gradient descent~(SGD) and its variants have become more and more popular in machine learning due to their efficiency and effectiveness. To handle large-scale problems, researchers have recently proposed several parallel SGD…

Machine Learning · Statistics 2015-08-25 Shen-Yi Zhao , Wu-Jun Li

Asynchronous Distributed Semi-Stochastic Gradient Optimization

With the recent proliferation of large-scale learning problems,there have been a lot of interest on distributed machine learning algorithms, particularly those that are based on stochastic gradient descent (SGD) and its variants. However,…

Machine Learning · Computer Science 2015-12-07 Ruiliang Zhang , Shuai Zheng , James T. Kwok

Variance Reduced methods for Non-convex Composition Optimization

This paper explores the non-convex composition optimization in the form including inner and outer finite-sum functions with a large number of component functions. This problem arises in some important applications such as nonlinear…

Machine Learning · Statistics 2017-11-15 Liu Liu , Ji Liu , Dacheng Tao

On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants

We study optimization algorithms based on variance reduction for stochastic gradient descent (SGD). Remarkable recent progress has been made in this direction through development of algorithms like SAG, SVRG, SAGA. These algorithms have…

Machine Learning · Computer Science 2016-01-26 Sashank J. Reddi , Ahmed Hefny , Suvrit Sra , Barnabás Póczos , Alex Smola

Asynchronous Stochastic Proximal Optimization Algorithms with Variance Reduction

Regularized empirical risk minimization (R-ERM) is an important branch of machine learning, since it constrains the capacity of the hypothesis space and guarantees the generalization ability of the learning algorithm. Two classic proximal…

Machine Learning · Computer Science 2016-09-28 Qi Meng , Wei Chen , Jingcheng Yu , Taifeng Wang , Zhi-Ming Ma , Tie-Yan Liu

Solving Stochastic Compositional Optimization is Nearly as Easy as Solving Stochastic Optimization

Stochastic compositional optimization generalizes classic (non-compositional) stochastic optimization to the minimization of compositions of functions. Each composition may introduce an additional expectation. The series of expectations may…

Optimization and Control · Mathematics 2021-09-29 Tianyi Chen , Yuejiao Sun , Wotao Yin

Efficient Distributed SGD with Variance Reduction

Stochastic Gradient Descent (SGD) has become one of the most popular optimization methods for training machine learning models on massive datasets. However, SGD suffers from two main drawbacks: (i) The noisy gradient updates have high…

Machine Learning · Computer Science 2017-04-10 Soham De , Tom Goldstein

Multi-Level Stochastic Gradient Methods for Nested Composition Optimization

Stochastic gradient methods are scalable for solving large-scale optimization problems that involve empirical expectations of loss functions. Existing results mainly apply to optimization problems where the objectives are one- or two-level…

Optimization and Control · Mathematics 2018-01-15 Shuoguang Yang , Mengdi Wang , Ethan X. Fang

Zeroth-order Asynchronous Doubly Stochastic Algorithm with Variance Reduction

Zeroth-order (derivative-free) optimization attracts a lot of attention in machine learning, because explicit gradient calculations may be computationally expensive or infeasible. To handle large scale problems both in volume and dimension,…

Machine Learning · Computer Science 2016-12-06 Bin Gu , Zhouyuan Huo , Heng Huang

Distributed Linear Regression with Compositional Covariates

With the availability of extraordinarily huge data sets, solving the problems of distributed statistical methodology and computing for such data sets has become increasingly crucial in the big data area. In this paper, we focus on the…

Machine Learning · Statistics 2023-10-24 Yue Chao , Lei Huang , Xuejun Ma

Distributed stochastic optimization with large delays

One of the most widely used methods for solving large-scale stochastic optimization problems is distributed asynchronous stochastic gradient descent (DASGD), a family of algorithms that result from parallelizing stochastic gradient descent…

Optimization and Control · Mathematics 2021-07-08 Zhengyuan Zhou , Panayotis Mertikopoulos , Nicholas Bambos , Peter W. Glynn , Yinyu Ye

Asynchronous Parallel Stochastic Gradient Descent - A Numeric Core for Scalable Distributed Machine Learning Algorithms

The implementation of a vast majority of machine learning (ML) algorithms boils down to solving a numerical optimization problem. In this context, Stochastic Gradient Descent (SGD) methods have long proven to provide good results, both in…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-06 Janis Keuper , Franz-Josef Pfreundt

Faster Adaptive Momentum-Based Federated Methods for Distributed Composition Optimization

Federated Learning is a popular distributed learning paradigm in machine learning. Meanwhile, composition optimization is an effective hierarchical learning model, which appears in many machine learning applications such as meta learning…

Machine Learning · Computer Science 2023-03-31 Feihu Huang

Distributed Stochastic Optimization via Adaptive SGD

Stochastic convex optimization algorithms are the most popular way to train machine learning models on large-scale data. Scaling up the training process of these models is crucial, but the most popular algorithm, Stochastic Gradient Descent…

Machine Learning · Statistics 2018-10-30 Ashok Cutkosky , Robert Busa-Fekete

Parallel optimized sampling for stochastic equations

Stochastic equations play an important role in computational science, due to their ability to treat a wide variety of complex statistical problems. However, current algorithms are strongly limited by their sampling variance, which scales…

Numerical Analysis · Mathematics 2017-01-04 Bogdan Opanchuk , Simon Kiesewetter , Peter D. Drummond

Asynchronous Optimization Methods for Efficient Training of Deep Neural Networks with Guarantees

Asynchronous distributed algorithms are a popular way to reduce synchronization costs in large-scale optimization, and in particular for neural network training. However, for nonsmooth and nonconvex objectives, few convergence guarantees…

Optimization and Control · Mathematics 2020-07-14 Vyacheslav Kungurtsev , Malcolm Egan , Bapi Chatterjee , Dan Alistarh

DSCOVR: Randomized Primal-Dual Block Coordinate Algorithms for Asynchronous Distributed Optimization

Machine learning with big data often involves large optimization models. For distributed optimization over a cluster of machines, frequent communication and synchronization of all model parameters (optimization variables) can be very…

Optimization and Control · Mathematics 2017-10-17 Lin Xiao , Adams Wei Yu , Qihang Lin , Weizhu Chen

Accelerated Method for Stochastic Composition Optimization with Nonsmooth Regularization

Stochastic composition optimization draws much attention recently and has been successful in many emerging applications of machine learning, statistical analysis, and reinforcement learning. In this paper, we focus on the composition…

Machine Learning · Computer Science 2018-01-01 Zhouyuan Huo , Bin Gu , Ji Liu , Heng Huang