Related papers: Stochastic Distributed Optimization for Machine Le…

Scheduling and Communication Schemes for Decentralized Federated Learning

Federated learning (FL) is a distributed machine learning paradigm in which a large number of clients coordinate with a central server to learn a model without sharing their own training data. One central server is not enough, due to…

Machine Learning · Computer Science 2023-11-28 Bahaa-Eldin Ali Abdelghany , Ana Fernández-Vilas , Manuel Fernández-Veiga , Nashwa El-Bendary , Ammar M. Hassan , Walid M. Abdelmoez

The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory

Stochastic Gradient Descent (SGD) is a fundamental algorithm in machine learning, representing the optimization backbone for training several classic models, from regression to neural networks. Given the recent practical focus on…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-25 Dan Alistarh , Christopher De Sa , Nikola Konstantinov

Decentralized Differentially Private Without-Replacement Stochastic Gradient Descent

While machine learning has achieved remarkable results in a wide variety of domains, the training of models often requires large datasets that may need to be collected from different individuals. As sensitive information may be contained in…

Machine Learning · Computer Science 2023-02-07 Richeng Jin , Xiaofan He , Huaiyu Dai

Asynchronous Distributed Semi-Stochastic Gradient Optimization

With the recent proliferation of large-scale learning problems,there have been a lot of interest on distributed machine learning algorithms, particularly those that are based on stochastic gradient descent (SGD) and its variants. However,…

Machine Learning · Computer Science 2015-12-07 Ruiliang Zhang , Shuai Zheng , James T. Kwok

Loss Landscape Dependent Self-Adjusting Learning Rates in Decentralized Stochastic Gradient Descent

Distributed Deep Learning (DDL) is essential for large-scale Deep Learning (DL) training. Synchronous Stochastic Gradient Descent (SSGD) 1 is the de facto DDL optimization method. Using a sufficiently large batch size is critical to…

Machine Learning · Computer Science 2021-12-03 Wei Zhang , Mingrui Liu , Yu Feng , Xiaodong Cui , Brian Kingsbury , Yuhai Tu

Cooperative SGD with Dynamic Mixing Matrices

One of the most common methods to train machine learning algorithms today is the stochastic gradient descent (SGD). In a distributed setting, SGD-based algorithms have been shown to converge theoretically under specific circumstances. A…

Machine Learning · Computer Science 2025-08-22 Soumya Sarkar , Shweta Jain

Scaling up Stochastic Gradient Descent for Non-convex Optimisation

Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions…

Machine Learning · Statistics 2022-10-07 Saad Mohamad , Hamad Alamri , Abdelhamid Bouchachia

A Stochastic Large-scale Machine Learning Algorithm for Distributed Features and Observations

As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine…

Machine Learning · Statistics 2019-12-10 Biyi Fang , Diego Klabjan

Asynchronous Parallel Stochastic Gradient Descent - A Numeric Core for Scalable Distributed Machine Learning Algorithms

The implementation of a vast majority of machine learning (ML) algorithms boils down to solving a numerical optimization problem. In this context, Stochastic Gradient Descent (SGD) methods have long proven to provide good results, both in…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-06 Janis Keuper , Franz-Josef Pfreundt

Decentralized Markov Chain Gradient Descent

Decentralized stochastic gradient method emerges as a promising solution for solving large-scale machine learning problems. This paper studies the decentralized Markov chain gradient descent (DMGD) algorithm - a variant of the decentralized…

Optimization and Control · Mathematics 2021-04-14 Tao Sun , Dongsheng Li

Distributed Stochastic Optimization via Adaptive SGD

Stochastic convex optimization algorithms are the most popular way to train machine learning models on large-scale data. Scaling up the training process of these models is crucial, but the most popular algorithm, Stochastic Gradient Descent…

Machine Learning · Statistics 2018-10-30 Ashok Cutkosky , Robert Busa-Fekete

On Generalization of Decentralized Learning with Separable Data

Decentralized learning offers privacy and communication efficiency when data are naturally distributed among agents communicating over an underlying graph. Motivated by overparameterized learning settings, in which models are trained to…

Machine Learning · Computer Science 2023-03-28 Hossein Taheri , Christos Thrampoulidis

Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models: Extension

We consider distributed optimization under communication constraints for training deep learning models. We propose a new algorithm, whose parameter updates rely on two forces: a regular gradient step, and a corrective direction dictated by…

Machine Learning · Computer Science 2022-04-29 Yunfei Teng , Wenbo Gao , Francois Chalus , Anna Choromanska , Donald Goldfarb , Adrian Weller

Decentralized Sporadic Federated Learning: A Unified Algorithmic Framework with Convergence Guarantees

Decentralized federated learning (DFL) captures FL settings where both (i) model updates and (ii) model aggregations are exclusively carried out by the clients without a central server. Existing DFL works have mostly focused on settings…

Machine Learning · Computer Science 2025-03-07 Shahryar Zehtabi , Dong-Jun Han , Rohit Parasnis , Seyyedali Hosseinalipour , Christopher G. Brinton

Distributed SGD Generalizes Well Under Asynchrony

The performance of fully synchronized distributed systems has faced a bottleneck due to the big data trend, under which asynchronous distributed systems are becoming a major popularity due to their powerful scalability. In this paper, we…

Machine Learning · Statistics 2019-10-01 Jayanth Regatti , Gaurav Tendolkar , Yi Zhou , Abhishek Gupta , Yingbin Liang

Distributed Stochastic Gradient Descent with Staleness: A Stochastic Delay Differential Equation Based Framework

Distributed stochastic gradient descent (SGD) has attracted considerable recent attention due to its potential for scaling computational resources, reducing training time, and helping protect user privacy in machine learning. However, the…

Machine Learning · Computer Science 2025-02-27 Siyuan Yu , Wei Chen , H. Vincent Poor

Layered SGD: A Decentralized and Synchronous SGD Algorithm for Scalable Deep Neural Network Training

Stochastic Gradient Descent (SGD) is the most popular algorithm for training deep neural networks (DNNs). As larger networks and datasets cause longer training times, training on distributed systems is common and distributed SGD variants,…

Machine Learning · Computer Science 2019-06-17 Kwangmin Yu , Thomas Flynn , Shinjae Yoo , Nicholas D'Imperio

Towards Geo-Distributed Machine Learning

Latency to end-users and regulatory requirements push large companies to build data centers all around the world. The resulting data is "born" geographically distributed. On the other hand, many machine learning applications require a…

Machine Learning · Computer Science 2016-03-31 Ignacio Cano , Markus Weimer , Dhruv Mahajan , Carlo Curino , Giovanni Matteo Fumarola

Snap-Shot Decentralized Stochastic Gradient Tracking Methods

In decentralized optimization, $m$ agents form a network and only communicate with their neighbors, which gives advantages in data ownership, privacy, and scalability. At the same time, decentralized stochastic gradient descent…

Optimization and Control · Mathematics 2022-12-13 Haishan Ye , Xiangyu Chang

Spectral-Convergent Decentralized Machine Learning: Theory and Application in Space Networks

Decentralized machine learning (DML) supports collaborative training in large-scale networks with no central server. It is sensitive to the quality and reliability of inter-device communications that result in time-varying and stochastic…

Signal Processing · Electrical Eng. & Systems 2025-11-06 Zhiyuan Zhai , Shuyan Hu , Wei Ni , Xiaojun Yuan , Xin Wang