Related papers: Probabilistic Synchronous Parallel

Robust Synchronisation for Federated Learning in The Face of Correlated Device Failure

Probabilistic Synchronous Parallel (PSP) is a technique in distributed learning systems to reduce synchronization bottlenecks by sampling a subset of participating nodes per round. In Federated Learning (FL), where edge devices are often…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-20 Stefan Behfar , Richard Mortier

Dynamic Stale Synchronous Parallel Distributed Training for Deep Learning

Deep learning is a popular machine learning technique and has been applied to many real-world problems. However, training a deep neural network is very time-consuming, especially on big data. It has become difficult for a single machine to…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-04 Xing Zhao , Aijun An , Junfeng Liu , Bao Xin Chen

Make Workers Work Harder: Decoupled Asynchronous Proximal Stochastic Gradient Descent

Asynchronous parallel optimization algorithms for solving large-scale machine learning problems have drawn significant attention from academia to industry recently. This paper proposes a novel algorithm, decoupled asynchronous proximal…

Optimization and Control · Mathematics 2016-05-24 Yitan Li , Linli Xu , Xiaowei Zhong , Qing Ling

HPSGD: Hierarchical Parallel SGD With Stale Gradients Featuring

While distributed training significantly speeds up the training process of the deep neural network (DNN), the utilization of the cluster is relatively low due to the time-consuming data synchronizing between workers. To alleviate this…

Machine Learning · Computer Science 2020-12-01 Yuhao Zhou , Qing Ye , Hailun Zhang , Jiancheng Lv

Asynchronous Parallel Stochastic Gradient Descent - A Numeric Core for Scalable Distributed Machine Learning Algorithms

The implementation of a vast majority of machine learning (ML) algorithms boils down to solving a numerical optimization problem. In this context, Stochastic Gradient Descent (SGD) methods have long proven to provide good results, both in…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-06 Janis Keuper , Franz-Josef Pfreundt

Weighted parallel SGD for distributed unbalanced-workload training system

Stochastic gradient descent (SGD) is a popular stochastic optimization method in machine learning. Traditional parallel SGD algorithms, e.g., SimuParallel SGD, often require all nodes to have the same performance or to consume equal…

Machine Learning · Computer Science 2017-08-17 Cheng Daning , Li Shigang , Zhang Yunquan

Hybrid Approach to Parallel Stochastic Gradient Descent

Stochastic Gradient Descent is used for large datasets to train models to reduce the training time. On top of that data parallelism is widely used as a method to efficiently train neural networks using multiple worker nodes in parallel.…

Machine Learning · Computer Science 2024-07-02 Aakash Sudhirbhai Vora , Dhrumil Chetankumar Joshi , Aksh Kantibhai Patel

Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep Learning

Stochastic Gradient Descent (SGD) has become the de facto way to train deep neural networks in distributed clusters. A critical factor in determining the training throughput and model accuracy is the choice of the parameter synchronization…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-21 Shijian Li , Oren Mangoubi , Lijie Xu , Tian Guo

Guided parallelized stochastic gradient descent for delay compensation

Stochastic gradient descent (SGD) algorithm and its variations have been effectively used to optimize neural network models. However, with the rapid growth of big data and deep learning, SGD is no longer the most suitable choice due to its…

Machine Learning · Computer Science 2024-02-13 Anuraganand Sharma

Enhancing Parallelism in Decentralized Stochastic Convex Optimization

Decentralized learning has emerged as a powerful approach for handling large datasets across multiple machines in a communication-efficient manner. However, such methods often face scalability limitations, as increasing the number of…

Machine Learning · Computer Science 2025-06-03 Ofri Eisen , Ron Dorfman , Kfir Y. Levy

High Throughput Synchronous Distributed Stochastic Gradient Descent

We introduce a new, high-throughput, synchronous, distributed, data-parallel, stochastic-gradient-descent learning algorithm. This algorithm uses amortized inference in a compute-cluster-specific, deep, generative, dynamical model to…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-03-14 Michael Teng , Frank Wood

Scaling up Stochastic Gradient Descent for Non-convex Optimisation

Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions…

Machine Learning · Statistics 2022-10-07 Saad Mohamad , Hamad Alamri , Abdelhamid Bouchachia

Synchronization-Based Cooperative Distributed Model Predictive Control

Distributed control algorithms are known to reduce overall computation time compared to centralized control algorithms. However, they can result in inconsistent solutions leading to the violation of safety-critical constraints. Inconsistent…

Systems and Control · Electrical Eng. & Systems 2024-11-26 Julius Beerwerth , Maximilian Kloock , Bassam Alrifaee

The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory

Stochastic Gradient Descent (SGD) is a fundamental algorithm in machine learning, representing the optimization backbone for training several classic models, from regression to neural networks. Given the recent practical focus on…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-25 Dan Alistarh , Christopher De Sa , Nikola Konstantinov

Sampling Parallelism for Fast and Efficient Bayesian Learning

Machine learning models, and deep neural networks in particular, are increasingly deployed in risk-sensitive domains such as healthcare, environmental forecasting, and finance, where reliable quantification of predictive uncertainty is…

Machine Learning · Computer Science 2026-04-07 Asena Karolin Özdemir , Lars H. Heyen , Arvid Weyrauch , Achim Streit , Markus Götz , Charlotte Debus

On the Convergence Analysis of Asynchronous SGD for Solving Consistent Linear Systems

In the realm of big data and machine learning, data-parallel, distributed stochastic algorithms have drawn significant attention in the present days.~While the synchronous versions of these algorithms are well understood in terms of their…

Optimization and Control · Mathematics 2020-04-07 Atal Narayan Sahu , Aritra Dutta , Aashutosh Tiwari , Peter Richtárik

Distributed Stochastic Optimization via Adaptive SGD

Stochastic convex optimization algorithms are the most popular way to train machine learning models on large-scale data. Scaling up the training process of these models is crucial, but the most popular algorithm, Stochastic Gradient Descent…

Machine Learning · Statistics 2018-10-30 Ashok Cutkosky , Robert Busa-Fekete

BSP Sorting: An experimental Study

The Bulk-Synchronous Parallel model of computation has been used for the architecture independent design and analysis of parallel algorithms whose performance is expressed not only in terms of problem size n but also in terms of parallel…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-08-29 Alexandros V. Gerbessiotis , Constantinos J. Siniolakis

Efficient Parallel Self-Adjusting Computation

Self-adjusting computation is an approach for automatically producing dynamic algorithms from static ones. The approach works by tracking control and data dependencies, and propagating changes through the dependencies when making an update.…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-17 Daniel Anderson , Guy E. Blelloch , Anubhav Baweja , Umut A. Acar

ASAP: Asynchronous Approximate Data-Parallel Computation

Emerging workloads, such as graph processing and machine learning are approximate because of the scale of data involved and the stochastic nature of the underlying algorithms. These algorithms are often distributed over multiple machines…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-12-28 Asim Kadav , Erik Kruus