Related papers: Accelerating Distributed ML Training via Selective…

DS-Sync: Addressing Network Bottlenecks with Divide-and-Shuffle Synchronization for Distributed DNN Training

Bulk synchronous parallel (BSP) is the de-facto paradigm for distributed DNN training in today's production clusters. However, due to the global synchronization nature, its performance can be significantly influenced by network bottlenecks…

Machine Learning · Computer Science 2022-01-14 Weiyan Wang , Cengguang Zhang , Liu Yang , Kai Chen , Kun Tan

Accelerating Neural Network Training with Distributed Asynchronous and Selective Optimization (DASO)

With increasing data and model complexities, the time required to train neural networks has become prohibitively large. To address the exponential rise in training time, users are turning to data parallel neural networks (DPNN) to utilize…

Machine Learning · Computer Science 2022-02-09 Daniel Coquelin , Charlotte Debus , Markus Götz , Fabrice von der Lehr , James Kahn , Martin Siggel , Achim Streit

Selfish Sparse RNN Training

Sparse neural networks have been widely applied to reduce the computational demands of training and deploying over-parameterized deep neural networks. For inference acceleration, methods that discover a sparse network from a pre-trained…

Machine Learning · Computer Science 2021-06-16 Shiwei Liu , Decebal Constantin Mocanu , Yulong Pei , Mykola Pechenizkiy

Distributed Training of Deep Neural Networks with Theoretical Analysis: Under SSP Setting

We propose a distributed approach to train deep neural networks (DNNs), which has guaranteed convergence theoretically and great scalability empirically: close to 6 times faster on instance of ImageNet data set when run with 6 machines. The…

Machine Learning · Statistics 2016-10-04 Abhimanu Kumar , Pengtao Xie , Junming Yin , Eric P. Xing

Dynamic Stale Synchronous Parallel Distributed Training for Deep Learning

Deep learning is a popular machine learning technique and has been applied to many real-world problems. However, training a deep neural network is very time-consuming, especially on big data. It has become difficult for a single machine to…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-04 Xing Zhao , Aijun An , Junfeng Liu , Bao Xin Chen

Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep Learning

Stochastic Gradient Descent (SGD) has become the de facto way to train deep neural networks in distributed clusters. A critical factor in determining the training throughput and model accuracy is the choice of the parameter synchronization…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-21 Shijian Li , Oren Mangoubi , Lijie Xu , Tian Guo

HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis

Single-Program-Multiple-Data (SPMD) parallelism has recently been adopted to train large deep neural networks (DNNs). Few studies have explored its applicability on heterogeneous clusters, to fully exploit available resources for large…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-12 Shiwei Zhang , Lansong Diao , Chuan Wu , Zongyan Cao , Siyu Wang , Wei Lin

Asynchronous Optimization Methods for Efficient Training of Deep Neural Networks with Guarantees

Asynchronous distributed algorithms are a popular way to reduce synchronization costs in large-scale optimization, and in particular for neural network training. However, for nonsmooth and nonconvex objectives, few convergence guarantees…

Optimization and Control · Mathematics 2020-07-14 Vyacheslav Kungurtsev , Malcolm Egan , Bapi Chatterjee , Dan Alistarh

HPSGD: Hierarchical Parallel SGD With Stale Gradients Featuring

While distributed training significantly speeds up the training process of the deep neural network (DNN), the utilization of the cluster is relatively low due to the time-consuming data synchronizing between workers. To alleviate this…

Machine Learning · Computer Science 2020-12-01 Yuhao Zhou , Qing Ye , Hailun Zhang , Jiancheng Lv

DBS: Dynamic Batch Size For Distributed Deep Neural Network Training

Synchronous strategies with data parallelism, such as the Synchronous StochasticGradient Descent (S-SGD) and the model averaging methods, are widely utilizedin distributed training of Deep Neural Networks (DNNs), largely owing to itseasy…

Machine Learning · Computer Science 2022-11-04 Qing Ye , Yuhao Zhou , Mingjia Shi , Yanan Sun , Jiancheng Lv

Revisiting Distributed Synchronous SGD

Distributed training of deep learning models on large-scale training data is typically conducted with asynchronous stochastic optimization to maximize the rate of updates, at the cost of additional noise introduced from asynchrony. In…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-03-21 Xinghao Pan , Jianmin Chen , Rajat Monga , Samy Bengio , Rafal Jozefowicz

Revisiting Distributed Synchronous SGD

Distributed training of deep learning models on large-scale training data is typically conducted with asynchronous stochastic optimization to maximize the rate of updates, at the cost of additional noise introduced from asynchrony. In…

Machine Learning · Computer Science 2017-03-22 Jianmin Chen , Xinghao Pan , Rajat Monga , Samy Bengio , Rafal Jozefowicz

OSP: Boosting Distributed Model Training with 2-stage Synchronization

Distributed deep learning (DDL) is a promising research area, which aims to increase the efficiency of training deep learning tasks with large size of datasets and models. As the computation capability of DDL nodes continues to increase,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-11 Zixuan Chen , Lei Shi , Xuandong Liu , Jiahui Li , Sen Liu , Yang Xu

Efficient Distributed Semi-Supervised Learning using Stochastic Regularization over Affinity Graphs

We describe a computationally efficient, stochastic graph-regularization technique that can be utilized for the semi-supervised training of deep neural networks in a parallel or distributed setting. We utilize a technique, first described…

Machine Learning · Statistics 2018-05-31 Sunil Thulasidasan , Jeffrey Bilmes , Garrett Kenyon

Distributed Deep Learning using Stochastic Gradient Staleness

Despite the notable success of deep neural networks (DNNs) in solving complex tasks, the training process still remains considerable challenges. A primary obstacle is the substantial time required for training, particularly as high…

Machine Learning · Computer Science 2025-09-09 Viet Hoang Pham , Hyo-Sung Ahn

Communication-Efficient Distributed Deep Learning via Federated Dynamic Averaging

The ever-growing volume and decentralized nature of data, coupled with the need to harness it and extract knowledge, have led to the extensive use of distributed deep learning (DDL) techniques for training. These techniques rely on local…

Machine Learning · Computer Science 2024-11-22 Michail Theologitis , Georgios Frangias , Georgios Anestis , Vasilis Samoladas , Antonios Deligiannakis

How to scale distributed deep learning?

Training time on large datasets for deep neural networks is the principal workflow bottleneck in a number of important applications of deep learning, such as object classification and detection in automatic driver assistance systems (ADAS).…

Machine Learning · Computer Science 2016-11-15 Peter H. Jin , Qiaochu Yuan , Forrest Iandola , Kurt Keutzer

Oscars: Adaptive Semi-Synchronous Parallel Model for Distributed Deep Learning with Global View

Deep learning has become an indispensable part of life, such as face recognition, NLP, etc., but the training of deep model has always been a challenge, and in recent years, the complexity of training data and models has shown explosive…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-02-18 Sheng Huang

Optimizing Distributed Training Approaches for Scaling Neural Networks

This paper presents a comparative analysis of distributed training strategies for large-scale neural networks, focusing on data parallelism, model parallelism, and hybrid approaches. We evaluate these strategies on image classification…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-01 Vishnu Vardhan Baligodugula , Fathi Amsaad

Experiments on Parallel Training of Deep Neural Network using Model Averaging

In this work we apply model averaging to parallel training of deep neural network (DNN). Parallelization is done in a model averaging manner. Data is partitioned and distributed to different nodes for local model updates, and model…

Machine Learning · Computer Science 2018-07-03 Hang Su , Haoyu Chen