Related papers: Stochastic Gradient Push for Distributed Deep Lear…
Push-Sum-based decentralized learning enables optimization over directed communication networks, where information exchange may be asymmetric. While convergence properties of such methods are well understood, their finite-iteration…
We consider a decentralized learning setting in which data is distributed over nodes in a graph. The goal is to learn a global model on the distributed data without involving any central entity that needs to be trusted. While gossip-based…
We study the asynchronous stochastic gradient descent algorithm for distributed training over $n$ workers which have varying computation and communication frequency over time. In this algorithm, workers compute stochastic gradients in…
We address the issue of speeding up the training of convolutional neural networks by studying a distributed method adapted to stochastic gradient descent. Our parallel optimization setup uses several threads, each applying individual…
In distributed training of deep neural networks, people usually run Stochastic Gradient Descent (SGD) or its variants on each machine and communicate with other machines periodically. However, SGD might converge slowly in training some deep…
Stochastic Gradient Descent (SGD) is the most popular algorithm for training deep neural networks (DNNs). As larger networks and datasets cause longer training times, training on distributed systems is common and distributed SGD variants,…
Stochastic Gradient Descent (SGD) is a fundamental algorithm in machine learning, representing the optimization backbone for training several classic models, from regression to neural networks. Given the recent practical focus on…
Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions…
We consider solving a convex, possibly stochastic optimization problem over a randomly time-varying multi-agent network. Each agent has access to some local objective function, and it only has unbiased estimates of the gradients of the…
Stochastic gradient descent (SGD) algorithm and its variations have been effectively used to optimize neural network models. However, with the rapid growth of big data and deep learning, SGD is no longer the most suitable choice due to its…
With the recent proliferation of large-scale learning problems,there have been a lot of interest on distributed machine learning algorithms, particularly those that are based on stochastic gradient descent (SGD) and its variants. However,…
This paper considers a general data-fitting problem over a networked system, in which many computing nodes are connected by an undirected graph. This kind of problem can find many real-world applications and has been studied extensively in…
The implementation of a vast majority of machine learning (ML) algorithms boils down to solving a numerical optimization problem. In this context, Stochastic Gradient Descent (SGD) methods have long proven to provide good results, both in…
In distributed and federated learning algorithms, communication overhead is often reduced by performing multiple local updates between communication rounds. However, due to data heterogeneity across nodes and the local gradient noise within…
The state-of-the-art deep learning algorithms rely on distributed training systems to tackle the increasing sizes of models and training data sets. Minibatch stochastic gradient descent (SGD) algorithm requires workers to halt forward/back…
Modern supervised learning techniques, particularly those using deep nets, involve fitting high dimensional labelled data sets with functions containing very large numbers of parameters. Much of this work is empirical. Interesting phenomena…
In this paper, we propose a Differentially Private Stochastic Gradient Push with Compressed communication (termed DP-CSGP) for decentralized learning over directed graphs. Different from existing works, the proposed algorithm is designed to…
Deep neural networks have been shown to achieve state-of-the-art performance in several machine learning tasks. Stochastic Gradient Descent (SGD) is the preferred optimization algorithm for training these networks and asynchronous SGD…
Distributed stochastic gradient descent (SGD) is essential for scaling the machine learning algorithms to a large number of computing nodes. However, the infrastructures variability such as high communication delay or random node slowdown…
One of the most common methods to train machine learning algorithms today is the stochastic gradient descent (SGD). In a distributed setting, SGD-based algorithms have been shown to converge theoretically under specific circumstances. A…