Related papers: Elastic Bulk Synchronous Parallel Model for Distri…

Dynamic Stale Synchronous Parallel Distributed Training for Deep Learning

Deep learning is a popular machine learning technique and has been applied to many real-world problems. However, training a deep neural network is very time-consuming, especially on big data. It has become difficult for a single machine to…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-04 Xing Zhao , Aijun An , Junfeng Liu , Bao Xin Chen

Semi-Dynamic Load Balancing: Efficient Distributed Learning in Non-Dedicated Environments

Machine learning (ML) models are increasingly trained in clusters with non-dedicated workers possessing heterogeneous resources. In such scenarios, model training efficiency can be negatively affected by stragglers -- workers that run much…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-12-09 Chen Chen , Qizhen Weng , Wei Wang , Baochun Li , Bo Li

Bulk-synchronous pseudo-streaming algorithms for many-core accelerators

The bulk-synchronous parallel (BSP) model provides a framework for writing parallel programs with predictable performance. In this paper we extend the BSP model to support what we will call pseudo-streaming algorithms for accelerators. We…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-03-24 Jan-Willem Buurlage , Tom Bannink , Abe Wits

Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep Learning

Stochastic Gradient Descent (SGD) has become the de facto way to train deep neural networks in distributed clusters. A critical factor in determining the training throughput and model accuracy is the choice of the parameter synchronization…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-21 Shijian Li , Oren Mangoubi , Lijie Xu , Tian Guo

Distributed Self-Paced Learning in Alternating Direction Method of Multipliers

Self-paced learning (SPL) mimics the cognitive process of humans, who generally learn from easy samples to hard ones. One key issue in SPL is the training process required for each instance weight depends on the other samples and thus…

Machine Learning · Computer Science 2018-07-09 Xuchao Zhang , Liang Zhao , Zhiqian Chen , Chang-Tien Lu

Parameter Database : Data-centric Synchronization for Scalable Machine Learning

We propose a new data-centric synchronization framework for carrying out of machine learning (ML) tasks in a distributed environment. Our framework exploits the iterative nature of ML algorithms and relaxes the application agnostic bulk…

Databases · Computer Science 2015-08-06 Naman Goel , Divyakant Agrawal , Sanjay Chawla , Ahmed Elmagarmid

EasyScale: Accuracy-consistent Elastic Training for Deep Learning

Distributed synchronized GPU training is commonly used for deep learning. The resource constraint of using a fixed number of GPUs makes large-scale training jobs suffer from long queuing time for resource allocation, and lowers the cluster…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-08 Mingzhen Li , Wencong Xiao , Biao Sun , Hanyu Zhao , Hailong Yang , Shiru Ren , Zhongzhi Luan , Xianyan Jia , Yi Liu , Yong Li , Wei Lin , Depei Qian

BSP Sorting: An experimental Study

The Bulk-Synchronous Parallel model of computation has been used for the architecture independent design and analysis of parallel algorithms whose performance is expressed not only in terms of problem size n but also in terms of parallel…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-08-29 Alexandros V. Gerbessiotis , Constantinos J. Siniolakis

DS-Sync: Addressing Network Bottlenecks with Divide-and-Shuffle Synchronization for Distributed DNN Training

Bulk synchronous parallel (BSP) is the de-facto paradigm for distributed DNN training in today's production clusters. However, due to the global synchronization nature, its performance can be significantly influenced by network bottlenecks…

Machine Learning · Computer Science 2022-01-14 Weiyan Wang , Cengguang Zhang , Liu Yang , Kai Chen , Kun Tan

Probabilistic Synchronous Parallel

Most machine learning and deep neural network algorithms rely on certain iterative algorithms to optimise their utility/cost functions, e.g. Stochastic Gradient Descent. In distributed learning, the networked nodes have to work…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-06 Liang Wang , Ben Catterall , Richard Mortier

OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning

Large-scale deep learning models contribute to significant performance improvements on varieties of downstream tasks. Current data and model parallelism approaches utilize model replication and partition techniques to support the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-19 Youhe Jiang , Fangcheng Fu , Xupeng Miao , Xiaonan Nie , Bin Cui

Analytical Estimation of the Scalability of Iterative Numerical Algorithms on Distributed Memory Multiprocessors

This article presents a new high-level parallel computational model named BSF - Bulk Synchronous Farm. The BSF model extends the BSP model to deal with the compute-intensive iterative numerical methods executed on distributed-memory…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-05-29 Leonid B. Sokolinsky

OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning

Large-scale deep learning models contribute to significant performance improvements on varieties of downstream tasks. Current data and model parallelism approaches utilize model replication and partition techniques to support the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-22 Youhe Jiang , Fangcheng Fu , Xupeng Miao , Xiaonan Nie , Bin Cui

Accelerating Distributed ML Training via Selective Synchronization

In distributed training, deep neural networks (DNNs) are launched over multiple workers concurrently and aggregate their local updates on each step in bulk-synchronous parallel (BSP) training. However, BSP does not linearly scale-out due to…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-30 Sahil Tyagi , Martin Swany

Straggler-Resilient Distributed Machine Learning with Dynamic Backup Workers

With the increasing demand for large-scale training of machine learning models, consensus-based distributed optimization methods have recently been advocated as alternatives to the popular parameter server framework. In this paradigm, each…

Machine Learning · Computer Science 2021-02-15 Guojun Xiong , Gang Yan , Rahul Singh , Jian Li

Robust Synchronisation for Federated Learning in The Face of Correlated Device Failure

Probabilistic Synchronous Parallel (PSP) is a technique in distributed learning systems to reduce synchronization bottlenecks by sampling a subset of participating nodes per round. In Federated Learning (FL), where edge devices are often…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-20 Stefan Behfar , Richard Mortier

Distributed Machine Learning through Heterogeneous Edge Systems

Many emerging AI applications request distributed machine learning (ML) among edge systems (e.g., IoT devices and PCs at the edge of the Internet), where data cannot be uploaded to a central venue for model training, due to their large…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-19 Hanpeng Hu , Dan Wang , Chuan Wu

FlexSP: Accelerating Large Language Model Training via Flexible Sequence Parallelism

Extending the context length (i.e., the maximum supported sequence length) of LLMs is of paramount significance. To facilitate long context training of LLMs, sequence parallelism has emerged as an essential technique, which scatters each…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-12 Yujie Wang , Shiju Wang , Shenhan Zhu , Fangcheng Fu , Xinyi Liu , Xuefeng Xiao , Huixia Li , Jiashi Li , Faming Wu , Bin Cui

Hybrid Approach to Parallel Stochastic Gradient Descent

Stochastic Gradient Descent is used for large datasets to train models to reduce the training time. On top of that data parallelism is widely used as a method to efficiently train neural networks using multiple worker nodes in parallel.…

Machine Learning · Computer Science 2024-07-02 Aakash Sudhirbhai Vora , Dhrumil Chetankumar Joshi , Aksh Kantibhai Patel

OSP: Boosting Distributed Model Training with 2-stage Synchronization

Distributed deep learning (DDL) is a promising research area, which aims to increase the efficiency of training deep learning tasks with large size of datasets and models. As the computation capability of DDL nodes continues to increase,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-11 Zixuan Chen , Lei Shi , Xuandong Liu , Jiahui Li , Sen Liu , Yang Xu