Related papers: Stochastic Distributed Optimization for Machine Le…
Federated learning (FL) is a distributed machine learning paradigm in which a large number of clients coordinate with a central server to learn a model without sharing their own training data. One central server is not enough, due to…
Stochastic Gradient Descent (SGD) is a fundamental algorithm in machine learning, representing the optimization backbone for training several classic models, from regression to neural networks. Given the recent practical focus on…
While machine learning has achieved remarkable results in a wide variety of domains, the training of models often requires large datasets that may need to be collected from different individuals. As sensitive information may be contained in…
With the recent proliferation of large-scale learning problems,there have been a lot of interest on distributed machine learning algorithms, particularly those that are based on stochastic gradient descent (SGD) and its variants. However,…
Distributed Deep Learning (DDL) is essential for large-scale Deep Learning (DL) training. Synchronous Stochastic Gradient Descent (SSGD) 1 is the de facto DDL optimization method. Using a sufficiently large batch size is critical to…
One of the most common methods to train machine learning algorithms today is the stochastic gradient descent (SGD). In a distributed setting, SGD-based algorithms have been shown to converge theoretically under specific circumstances. A…
Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions…
As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine…
The implementation of a vast majority of machine learning (ML) algorithms boils down to solving a numerical optimization problem. In this context, Stochastic Gradient Descent (SGD) methods have long proven to provide good results, both in…
Decentralized stochastic gradient method emerges as a promising solution for solving large-scale machine learning problems. This paper studies the decentralized Markov chain gradient descent (DMGD) algorithm - a variant of the decentralized…
Stochastic convex optimization algorithms are the most popular way to train machine learning models on large-scale data. Scaling up the training process of these models is crucial, but the most popular algorithm, Stochastic Gradient Descent…
Decentralized learning offers privacy and communication efficiency when data are naturally distributed among agents communicating over an underlying graph. Motivated by overparameterized learning settings, in which models are trained to…
We consider distributed optimization under communication constraints for training deep learning models. We propose a new algorithm, whose parameter updates rely on two forces: a regular gradient step, and a corrective direction dictated by…
Decentralized federated learning (DFL) captures FL settings where both (i) model updates and (ii) model aggregations are exclusively carried out by the clients without a central server. Existing DFL works have mostly focused on settings…
The performance of fully synchronized distributed systems has faced a bottleneck due to the big data trend, under which asynchronous distributed systems are becoming a major popularity due to their powerful scalability. In this paper, we…
Distributed stochastic gradient descent (SGD) has attracted considerable recent attention due to its potential for scaling computational resources, reducing training time, and helping protect user privacy in machine learning. However, the…
Stochastic Gradient Descent (SGD) is the most popular algorithm for training deep neural networks (DNNs). As larger networks and datasets cause longer training times, training on distributed systems is common and distributed SGD variants,…
Latency to end-users and regulatory requirements push large companies to build data centers all around the world. The resulting data is "born" geographically distributed. On the other hand, many machine learning applications require a…
In decentralized optimization, $m$ agents form a network and only communicate with their neighbors, which gives advantages in data ownership, privacy, and scalability. At the same time, decentralized stochastic gradient descent…
Decentralized machine learning (DML) supports collaborative training in large-scale networks with no central server. It is sensitive to the quality and reliability of inter-device communications that result in time-varying and stochastic…