Related papers: ErasureHead: Distributed Gradient Descent without …

Speeding Up Distributed Gradient Descent by Utilizing Non-persistent Stragglers

Distributed gradient descent (DGD) is an efficient way of implementing gradient descent (GD), especially for large data sets, by dividing the computation tasks into smaller subtasks and assigning to different computing servers (CSs) to be…

Information Theory · Computer Science 2018-11-29 Emre Ozfatura , Deniz Gunduz , Sennur Ulukus

Asynchronous Stochastic Gradient Descent with Decoupled Backpropagation and Layer-Wise Updates

The increasing size of deep learning models has made distributed training across multiple devices essential. However, current methods such as distributed data-parallel training suffer from large communication and synchronization overheads…

Machine Learning · Computer Science 2025-02-10 Cabrel Teguemne Fokam , Khaleelulla Khan Nazeer , Lukas König , David Kappel , Anand Subramoney

Distributed Stochastic Gradient Descent Using LDGM Codes

We consider a distributed learning problem in which the computation is carried out on a system consisting of a master node and multiple worker nodes. In such systems, the existence of slow-running machines called stragglers will cause a…

Information Theory · Computer Science 2019-01-16 Shunsuke Horii , Takahiro Yoshida , Manabu Kobayashi , Toshiyasu Matsushima

Approximate Gradient Coding with Optimal Decoding

In distributed optimization problems, a technique called gradient coding, which involves replicating data points, has been used to mitigate the effect of straggling machines. Recent work has studied approximate gradient coding, which…

Machine Learning · Statistics 2021-08-09 Margalit Glasgow , Mary Wootters

Gradient Coding with Clustering and Multi-message Communication

Gradient descent (GD) methods are commonly employed in machine learning problems to optimize the parameters of the model in an iterative fashion. For problems with massive datasets, computations are distributed to many parallel computing…

Information Theory · Computer Science 2019-03-06 Emre Ozfatura , Deniz Gunduz , Sennur Ulukus

Scaling up Stochastic Gradient Descent for Non-convex Optimisation

Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions…

Machine Learning · Statistics 2022-10-07 Saad Mohamad , Hamad Alamri , Abdelhamid Bouchachia

Communication-Efficient Distributed SGD with Compressed Sensing

We consider large scale distributed optimization over a set of edge devices connected to a central server, where the limited communication bandwidth between the server and edge devices imposes a significant bottleneck for the optimization…

Optimization and Control · Mathematics 2021-12-28 Yujie Tang , Vikram Ramanathan , Junshan Zhang , Na Li

Robust Gradient Descent via Moment Encoding with LDPC Codes

This paper considers the problem of implementing large-scale gradient descent algorithms in a distributed computing setting in the presence of {\em straggling} processors. To mitigate the effect of the stragglers, it has been previously…

Machine Learning · Statistics 2019-01-04 Raj Kumar Maity , Ankit Singh Rawat , Arya Mazumdar

Accelerated Gradient Descent for Faster Convergence with Minimal Overhead

In this paper, we present CT-AGD (Curvature-Tuned Accelerated Gradient Descent), an optimization method for non-convex optimization problems in deep learning training tasks. CT-AGD is a general boosting procedure that accelerates…

Machine Learning · Computer Science 2026-05-18 Manuel Graca , L. Miguel Silveira , Arlindo Oliveira , Frank Liu

Guided parallelized stochastic gradient descent for delay compensation

Stochastic gradient descent (SGD) algorithm and its variations have been effectively used to optimize neural network models. However, with the rapid growth of big data and deep learning, SGD is no longer the most suitable choice due to its…

Machine Learning · Computer Science 2024-02-13 Anuraganand Sharma

Gradient Coding with Dynamic Clustering for Straggler-Tolerant Distributed Learning

Distributed implementations are crucial in speeding up large scale machine learning applications. Distributed gradient descent (GD) is widely employed to parallelize the learning task by distributing the dataset across multiple workers. A…

Information Theory · Computer Science 2021-03-02 Baturalp Buyukates , Emre Ozfatura , Sennur Ulukus , Deniz Gunduz

Faster Distributed Deep Net Training: Computation and Communication Decoupled Stochastic Gradient Descent

With the increase in the amount of data and the expansion of model scale, distributed parallel training becomes an important and successful technique to address the optimization challenges. Nevertheless, although distributed stochastic…

Machine Learning · Computer Science 2019-09-23 Shuheng Shen , Linli Xu , Jingchang Liu , Xianfeng Liang , Yifei Cheng

Asynchronous Decentralized SGD under Non-Convexity: A Block-Coordinate Descent Framework

Decentralized optimization has become vital for leveraging distributed data without central control, enhancing scalability and privacy. However, practical deployments face fundamental challenges due to heterogeneous computation speeds and…

Machine Learning · Computer Science 2025-05-16 Yijie Zhou , Shi Pu

Lightweight Projective Derivative Codes for Compressed Asynchronous Gradient Descent

Coded distributed computation has become common practice for performing gradient descent on large datasets to mitigate stragglers and other faults. This paper proposes a novel algorithm that encodes the partial derivatives themselves and…

Machine Learning · Computer Science 2022-06-22 Pedro Soto , Ilia Ilmer , Haibin Guan , Jun Li

FastSGD: A Fast Compressed SGD Framework for Distributed Machine Learning

With the rapid increase of big data, distributed Machine Learning (ML) has been widely applied in training large-scale models. Stochastic Gradient Descent (SGD) is arguably the workhorse algorithm of ML. Distributed ML models trained by SGD…

Machine Learning · Computer Science 2021-12-09 Keyu Yang , Lu Chen , Zhihao Zeng , Yunjun Gao

A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks

In distributed training of deep neural networks, people usually run Stochastic Gradient Descent (SGD) or its variants on each machine and communicate with other machines periodically. However, SGD might converge slowly in training some deep…

Machine Learning · Computer Science 2022-10-14 Mingrui Liu , Zhenxun Zhuang , Yunwei Lei , Chunyang Liao

Distributed stochastic optimization with large delays

One of the most widely used methods for solving large-scale stochastic optimization problems is distributed asynchronous stochastic gradient descent (DASGD), a family of algorithms that result from parallelizing stochastic gradient descent…

Optimization and Control · Mathematics 2021-07-08 Zhengyuan Zhou , Panayotis Mertikopoulos , Nicholas Bambos , Peter W. Glynn , Yinyu Ye

Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization

Large-scale distributed optimization is of great importance in various applications. For data-parallel based distributed learning, the inter-node gradient communication often becomes the performance bottleneck. In this paper, we propose the…

Computer Vision and Pattern Recognition · Computer Science 2018-06-22 Jiaxiang Wu , Weidong Huang , Junzhou Huang , Tong Zhang

Distributed Stochastic Gradient Descent with Staleness: A Stochastic Delay Differential Equation Based Framework

Distributed stochastic gradient descent (SGD) has attracted considerable recent attention due to its potential for scaling computational resources, reducing training time, and helping protect user privacy in machine learning. However, the…

Machine Learning · Computer Science 2025-02-27 Siyuan Yu , Wei Chen , H. Vincent Poor

DaSGD: Squeezing SGD Parallelization Performance in Distributed Training Using Delayed Averaging

The state-of-the-art deep learning algorithms rely on distributed training systems to tackle the increasing sizes of models and training data sets. Minibatch stochastic gradient descent (SGD) algorithm requires workers to halt forward/back…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-02 Qinggang Zhou , Yawen Zhang , Pengcheng Li , Xiaoyong Liu , Jun Yang , Runsheng Wang , Ru Huang