Related papers: SySCD: A System-Aware Parallel Coordinate Descent …

Asynchronous Parallel Stochastic Gradient Descent - A Numeric Core for Scalable Distributed Machine Learning Algorithms

The implementation of a vast majority of machine learning (ML) algorithms boils down to solving a numerical optimization problem. In this context, Stochastic Gradient Descent (SGD) methods have long proven to provide good results, both in…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-06 Janis Keuper , Franz-Josef Pfreundt

An Asynchronous Parallel Stochastic Coordinate Descent Algorithm

We describe an asynchronous parallel stochastic coordinate descent algorithm for minimizing smooth unconstrained or separably constrained functions. The method achieves a linear convergence rate on functions that satisfy an essential strong…

Optimization and Control · Mathematics 2014-11-12 Ji Liu , Stephen J. Wright , Christopher Ré , Victor Bittorf , Srikrishna Sridhar

Parallel Stochastic Gradient Descent with Sound Combiners

Stochastic gradient descent (SGD) is a well known method for regression and classification tasks. However, it is an inherently sequential algorithm at each step, the processing of the current example depends on the parameters learned from…

Machine Learning · Computer Science 2017-05-24 Saeed Maleki , Madanlal Musuvathi , Todd Mytkowicz

Fully Asynchronous Stochastic Coordinate Descent: A Tight Lower Bound on the Parallelism Achieving Linear Speedup

We seek tight bounds on the viable parallelism in asynchronous implementations of coordinate descent that achieves linear speedup. We focus on asynchronous coordinate descent (ACD) algorithms on convex functions which consist of the sum of…

Optimization and Control · Mathematics 2020-08-04 Yun Kuen Cheung , Richard Cole , Yixin Tao

Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties

We describe an asynchronous parallel stochastic proximal coordinate descent algorithm for minimizing a composite objective function, which consists of a smooth convex function plus a separable convex function. In contrast to previous…

Optimization and Control · Mathematics 2015-12-14 Ji Liu , Stephen J. Wright

Guided parallelized stochastic gradient descent for delay compensation

Stochastic gradient descent (SGD) algorithm and its variations have been effectively used to optimize neural network models. However, with the rapid growth of big data and deep learning, SGD is no longer the most suitable choice due to its…

Machine Learning · Computer Science 2024-02-13 Anuraganand Sharma

A Parallel SGD method with Strong Convergence

This paper proposes a novel parallel stochastic gradient descent (SGD) method that is obtained by applying parallel sets of SGD iterations (each set operating on one node using the data residing in it) for finding the direction in each…

Machine Learning · Computer Science 2013-11-05 Dhruv Mahajan , S. Sathiya Keerthi , S. Sundararajan , Leon Bottou

Efficient parallel coordinate descent algorithm for convex optimization problems with separable constraints: application to distributed MPC

In this paper we propose a parallel coordinate descent algorithm for solving smooth convex optimization problems with separable constraints that may arise e.g. in distributed model predictive control (MPC) for linear network systems. Our…

Optimization and Control · Mathematics 2014-11-19 Ion Necoara , Dragos Clipici

Scaling up Stochastic Gradient Descent for Non-convex Optimisation

Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions…

Machine Learning · Statistics 2022-10-07 Saad Mohamad , Hamad Alamri , Abdelhamid Bouchachia

PASSCoDe: Parallel ASynchronous Stochastic dual Co-ordinate Descent

Stochastic Dual Coordinate Descent (SDCD) has become one of the most efficient ways to solve the family of $\ell_2$-regularized empirical risk minimization problems, including linear SVM, logistic regression, and many others. The vanilla…

Machine Learning · Computer Science 2015-04-07 Cho-Jui Hsieh , Hsiang-Fu Yu , Inderjit S. Dhillon

Stochastic Gradient Descent on Highly-Parallel Architectures

There is an increased interest in building data analytics frameworks with advanced algebraic capabilities both in industry and academia. Many of these frameworks, e.g., TensorFlow and BIDMach, implement their compute-intensive primitives in…

Databases · Computer Science 2018-02-27 Yujing Ma , Florin Rusu , Martin Torres

MindTheStep-AsyncPSGD: Adaptive Asynchronous Parallel Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) is very useful in optimization problems with high-dimensional non-convex target functions, and hence constitutes an important component of several Machine Learning and Data Analytics methods. Recently there…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-11 Karl Bäckström , Marina Papatriantafilou , Philippas Tsigas

Scaling Up Coordinate Descent Algorithms for Large $\ell_1$ Regularization Problems

We present a generic framework for parallel coordinate descent (CD) algorithms that includes, as special cases, the original sequential algorithms Cyclic CD and Stochastic CD, as well as the recent parallel Shotgun algorithm. We introduce…

Machine Learning · Computer Science 2012-07-04 Chad Scherrer , Mahantesh Halappanavar , Ambuj Tewari , David Haglin

Asynchronous Decentralized Parallel Stochastic Gradient Descent

Most commonly used distributed machine learning systems are either synchronous or centralized asynchronous. Synchronous algorithms like AllReduce-SGD perform poorly in a heterogeneous environment, while asynchronous algorithms using a…

Optimization and Control · Mathematics 2018-09-26 Xiangru Lian , Wei Zhang , Ce Zhang , Ji Liu

Asynchronous Fully-Decentralized SGD in the Cluster-Based Model

This paper presents fault-tolerant asynchronous Stochastic Gradient Descent (SGD) algorithms. SGD is widely used for approximating the minimum of a cost function $Q$, as a core part of optimization and learning algorithms. Our algorithms…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-14 Hagit Attiya , Noa Schiller

The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory

Stochastic Gradient Descent (SGD) is a fundamental algorithm in machine learning, representing the optimization backbone for training several classic models, from regression to neural networks. Given the recent practical focus on…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-25 Dan Alistarh , Christopher De Sa , Nikola Konstantinov

Fast Asynchronous Parallel Stochastic Gradient Decent

Stochastic gradient descent~(SGD) and its variants have become more and more popular in machine learning due to their efficiency and effectiveness. To handle large-scale problems, researchers have recently proposed several parallel SGD…

Machine Learning · Statistics 2015-08-25 Shen-Yi Zhao , Wu-Jun Li

Make Workers Work Harder: Decoupled Asynchronous Proximal Stochastic Gradient Descent

Asynchronous parallel optimization algorithms for solving large-scale machine learning problems have drawn significant attention from academia to industry recently. This paper proposes a novel algorithm, decoupled asynchronous proximal…

Optimization and Control · Mathematics 2016-05-24 Yitan Li , Linli Xu , Xiaowei Zhong , Qing Ling

Weighted parallel SGD for distributed unbalanced-workload training system

Stochastic gradient descent (SGD) is a popular stochastic optimization method in machine learning. Traditional parallel SGD algorithms, e.g., SimuParallel SGD, often require all nodes to have the same performance or to consume equal…

Machine Learning · Computer Science 2017-08-17 Cheng Daning , Li Shigang , Zhang Yunquan

Approximate Steepest Coordinate Descent

We propose a new selection rule for the coordinate selection in coordinate descent methods for huge-scale optimization. The efficiency of this novel scheme is provably better than the efficiency of uniformly random selection, and can reach…

Machine Learning · Computer Science 2017-06-27 Sebastian U. Stich , Anant Raj , Martin Jaggi