Related papers: Heterogeneity-Aware Asynchronous Decentralized Tra…

Asynchronous Decentralized Parallel Stochastic Gradient Descent

Most commonly used distributed machine learning systems are either synchronous or centralized asynchronous. Synchronous algorithms like AllReduce-SGD perform poorly in a heterogeneous environment, while asynchronous algorithms using a…

Optimization and Control · Mathematics 2018-09-26 Xiangru Lian , Wei Zhang , Ce Zhang , Ji Liu

Task allocation for decentralized training in heterogeneous environment

The demand for large-scale deep learning is increasing, and distributed training is the current mainstream solution. Ring AllReduce is widely used as a data parallel decentralized algorithm. However, in a heterogeneous environment, each…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-11-17 Yongyue Chao , Mingxue Liao , Jiaxin Gao

Biased Local SGD for Efficient Deep Learning on Heterogeneous Systems

Most parallel neural network training methods assume homogeneous computing resources. For example, synchronous data-parallel SGD suffers from significant synchronization overhead under heterogeneous workloads, often forcing practitioners to…

Machine Learning · Computer Science 2026-02-24 Jihyun Lim , Junhyuk Jo , Chanhyeok Ko , Young Min Go , Jimin Hwa , Sunwoo Lee

Locally Asynchronous Stochastic Gradient Descent for Decentralised Deep Learning

Distributed training algorithms of deep neural networks show impressive convergence speedup properties on very large problems. However, they inherently suffer from communication related slowdowns and communication topology becomes a crucial…

Machine Learning · Computer Science 2022-03-25 Tomer Avidor , Nadav Tal Israel

Asynchronous Decentralized Distributed Training of Acoustic Models

Large-scale distributed training of deep acoustic models plays an important role in today's high-performance automatic speech recognition (ASR). In this paper we investigate a variety of asynchronous decentralized distributed training…

Computation and Language · Computer Science 2021-10-22 Xiaodong Cui , Wei Zhang , Abdullah Kayi , Mingrui Liu , Ulrich Finkler , Brian Kingsbury , George Saon , David Kung

Improving Efficiency in Large-Scale Decentralized Distributed Training

Decentralized Parallel SGD (D-PSGD) and its asynchronous variant Asynchronous Parallel SGD (AD-PSGD) is a family of distributed learning algorithms that have been demonstrated to perform well for large-scale deep learning tasks. One…

Machine Learning · Computer Science 2020-02-05 Wei Zhang , Xiaodong Cui , Abdullah Kayi , Mingrui Liu , Ulrich Finkler , Brian Kingsbury , George Saon , Youssef Mroueh , Alper Buyuktosunoglu , Payel Das , David Kung , Michael Picheny

Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training

Following AI scaling trends, frontier models continue to grow in size and continue to be trained on larger datasets. Training these models requires huge investments in exascale computational resources, which has in turn driven developtment…

Machine Learning · Computer Science 2025-09-18 Hiroki Naganuma , Xinzhi Zhang , Man-Chung Yue , Ioannis Mitliagkas , Philipp A. Witte , Russell J. Hewett , Yin Tat Lee

Oscars: Adaptive Semi-Synchronous Parallel Model for Distributed Deep Learning with Global View

Deep learning has become an indispensable part of life, such as face recognition, NLP, etc., but the training of deep model has always been a challenge, and in recent years, the complexity of training data and models has shown explosive…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-02-18 Sheng Huang

A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition

Modern Automatic Speech Recognition (ASR) systems rely on distributed deep learning to for quick training completion. To enable efficient distributed training, it is imperative that the training algorithms can converge with a large…

Audio and Speech Processing · Electrical Eng. & Systems 2019-07-15 Wei Zhang , Xiaodong Cui , Ulrich Finkler , George Saon , Abdullah Kayi , Alper Buyuktosunoglu , Brian Kingsbury , David Kung , Michael Picheny

Dual-Delayed Asynchronous SGD for Arbitrarily Heterogeneous Data

We consider the distributed learning problem with data dispersed across multiple workers under the orchestration of a central server. Asynchronous Stochastic Gradient Descent (SGD) has been widely explored in such a setting to reduce the…

Machine Learning · Computer Science 2024-05-28 Xiaolu Wang , Yuchang Sun , Hoi-To Wai , Jun Zhang

Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training

Distributed training of deep nets is an important technique to address some of the present day computing challenges like memory consumption and computational demands. Classical distributed approaches, synchronous or asynchronous, are based…

Machine Learning · Computer Science 2019-01-14 Youjie Li , Mingchao Yu , Songze Li , Salman Avestimehr , Nam Sung Kim , Alexander Schwing

Ringleader ASGD: The First Asynchronous SGD with Optimal Time Complexity under Data Heterogeneity

Asynchronous stochastic gradient methods are central to scalable distributed optimization, particularly when devices differ in computational capabilities. Such settings arise naturally in federated learning, where training takes place on…

Optimization and Control · Mathematics 2026-02-20 Artavazd Maranjyan , Peter Richtárik

Efficient AllReduce with Stragglers

Distributed machine learning workloads use data and tensor parallelism for training and inference, both of which rely on the AllReduce collective to synchronize gradients or activations. However, AllReduce algorithms are delayed by the…

Machine Learning · Computer Science 2025-09-30 Arjun Devraj , Eric Ding , Abhishek Vijaya Kumar , Robert Kleinberg , Rachee Singh

Hop: Heterogeneity-Aware Decentralized Training

Recent work has shown that decentralized algorithms can deliver superior performance over centralized ones in the context of machine learning. The two approaches, with the main difference residing in their distinct communication patterns,…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-08 Qinyi Luo , Jinkun Lin , Youwei Zhuo , Xuehai Qian

ScaDLES: Scalable Deep Learning over Streaming data at the Edge

Distributed deep learning (DDL) training systems are designed for cloud and data-center environments that assumes homogeneous compute resources, high network bandwidth, sufficient memory and storage, as well as independent and identically…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-30 Sahil Tyagi , Martin Swany

SADDLe: Sharpness-Aware Decentralized Deep Learning with Heterogeneous Data

Decentralized training enables learning with distributed datasets generated at different locations without relying on a central server. In realistic scenarios, the data distribution across these sparsely connected learning agents can be…

Machine Learning · Computer Science 2025-02-27 Sakshi Choudhary , Sai Aparna Aketi , Kaushik Roy

Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data

Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks. In realistic learning scenarios, the presence of heterogeneity across different clients' local datasets poses an…

Machine Learning · Computer Science 2021-06-21 Tao Lin , Sai Praneeth Karimireddy , Sebastian U. Stich , Martin Jaggi

First Provably Optimal Asynchronous SGD for Homogeneous and Heterogeneous Data

Artificial intelligence has advanced rapidly through large neural networks trained on massive datasets using thousands of GPUs or TPUs. Such training can occupy entire data centers for weeks and requires enormous computational and energy…

Optimization and Control · Mathematics 2026-01-07 Artavazd Maranjyan

When Less is More: Achieving Faster Convergence in Distributed Edge Machine Learning

Distributed Machine Learning (DML) on resource-constrained edge devices holds immense potential for real-world applications. However, achieving fast convergence in DML in these heterogeneous environments remains a significant challenge.…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-10 Advik Raj Basani , Siddharth Chaitra Vivek , Advaith Krishna , Arnab K. Paul

HeterPS: Distributed Deep Learning With Reinforcement Learning Based Scheduling in Heterogeneous Environments

Deep neural networks (DNNs) exploit many layers and a large number of parameters to achieve excellent performance. The training process of DNN models generally handles large-scale input data with many sparse features, which incurs high…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-08 Ji Liu , Zhihua Wu , Dianhai Yu , Yanjun Ma , Danlei Feng , Minxu Zhang , Xinxuan Wu , Xuefeng Yao , Dejing Dou