English
Related papers

Related papers: Elastic Bulk Synchronous Parallel Model for Distri…

200 papers

Deep learning is a popular machine learning technique and has been applied to many real-world problems. However, training a deep neural network is very time-consuming, especially on big data. It has become difficult for a single machine to…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-04 Xing Zhao , Aijun An , Junfeng Liu , Bao Xin Chen

Machine learning (ML) models are increasingly trained in clusters with non-dedicated workers possessing heterogeneous resources. In such scenarios, model training efficiency can be negatively affected by stragglers -- workers that run much…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-12-09 Chen Chen , Qizhen Weng , Wei Wang , Baochun Li , Bo Li

The bulk-synchronous parallel (BSP) model provides a framework for writing parallel programs with predictable performance. In this paper we extend the BSP model to support what we will call pseudo-streaming algorithms for accelerators. We…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-03-24 Jan-Willem Buurlage , Tom Bannink , Abe Wits

Stochastic Gradient Descent (SGD) has become the de facto way to train deep neural networks in distributed clusters. A critical factor in determining the training throughput and model accuracy is the choice of the parameter synchronization…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-21 Shijian Li , Oren Mangoubi , Lijie Xu , Tian Guo

Self-paced learning (SPL) mimics the cognitive process of humans, who generally learn from easy samples to hard ones. One key issue in SPL is the training process required for each instance weight depends on the other samples and thus…

Machine Learning · Computer Science 2018-07-09 Xuchao Zhang , Liang Zhao , Zhiqian Chen , Chang-Tien Lu

We propose a new data-centric synchronization framework for carrying out of machine learning (ML) tasks in a distributed environment. Our framework exploits the iterative nature of ML algorithms and relaxes the application agnostic bulk…

Databases · Computer Science 2015-08-06 Naman Goel , Divyakant Agrawal , Sanjay Chawla , Ahmed Elmagarmid

Distributed synchronized GPU training is commonly used for deep learning. The resource constraint of using a fixed number of GPUs makes large-scale training jobs suffer from long queuing time for resource allocation, and lowers the cluster…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-08 Mingzhen Li , Wencong Xiao , Biao Sun , Hanyu Zhao , Hailong Yang , Shiru Ren , Zhongzhi Luan , Xianyan Jia , Yi Liu , Yong Li , Wei Lin , Depei Qian

The Bulk-Synchronous Parallel model of computation has been used for the architecture independent design and analysis of parallel algorithms whose performance is expressed not only in terms of problem size n but also in terms of parallel…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-08-29 Alexandros V. Gerbessiotis , Constantinos J. Siniolakis

Bulk synchronous parallel (BSP) is the de-facto paradigm for distributed DNN training in today's production clusters. However, due to the global synchronization nature, its performance can be significantly influenced by network bottlenecks…

Machine Learning · Computer Science 2022-01-14 Weiyan Wang , Cengguang Zhang , Liu Yang , Kai Chen , Kun Tan

Most machine learning and deep neural network algorithms rely on certain iterative algorithms to optimise their utility/cost functions, e.g. Stochastic Gradient Descent. In distributed learning, the networked nodes have to work…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-06 Liang Wang , Ben Catterall , Richard Mortier

Large-scale deep learning models contribute to significant performance improvements on varieties of downstream tasks. Current data and model parallelism approaches utilize model replication and partition techniques to support the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-19 Youhe Jiang , Fangcheng Fu , Xupeng Miao , Xiaonan Nie , Bin Cui

This article presents a new high-level parallel computational model named BSF - Bulk Synchronous Farm. The BSF model extends the BSP model to deal with the compute-intensive iterative numerical methods executed on distributed-memory…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-05-29 Leonid B. Sokolinsky

Large-scale deep learning models contribute to significant performance improvements on varieties of downstream tasks. Current data and model parallelism approaches utilize model replication and partition techniques to support the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-22 Youhe Jiang , Fangcheng Fu , Xupeng Miao , Xiaonan Nie , Bin Cui

In distributed training, deep neural networks (DNNs) are launched over multiple workers concurrently and aggregate their local updates on each step in bulk-synchronous parallel (BSP) training. However, BSP does not linearly scale-out due to…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-30 Sahil Tyagi , Martin Swany

With the increasing demand for large-scale training of machine learning models, consensus-based distributed optimization methods have recently been advocated as alternatives to the popular parameter server framework. In this paradigm, each…

Machine Learning · Computer Science 2021-02-15 Guojun Xiong , Gang Yan , Rahul Singh , Jian Li

Probabilistic Synchronous Parallel (PSP) is a technique in distributed learning systems to reduce synchronization bottlenecks by sampling a subset of participating nodes per round. In Federated Learning (FL), where edge devices are often…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-20 Stefan Behfar , Richard Mortier

Many emerging AI applications request distributed machine learning (ML) among edge systems (e.g., IoT devices and PCs at the edge of the Internet), where data cannot be uploaded to a central venue for model training, due to their large…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-19 Hanpeng Hu , Dan Wang , Chuan Wu

Extending the context length (i.e., the maximum supported sequence length) of LLMs is of paramount significance. To facilitate long context training of LLMs, sequence parallelism has emerged as an essential technique, which scatters each…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-12 Yujie Wang , Shiju Wang , Shenhan Zhu , Fangcheng Fu , Xinyi Liu , Xuefeng Xiao , Huixia Li , Jiashi Li , Faming Wu , Bin Cui

Stochastic Gradient Descent is used for large datasets to train models to reduce the training time. On top of that data parallelism is widely used as a method to efficiently train neural networks using multiple worker nodes in parallel.…

Machine Learning · Computer Science 2024-07-02 Aakash Sudhirbhai Vora , Dhrumil Chetankumar Joshi , Aksh Kantibhai Patel

Distributed deep learning (DDL) is a promising research area, which aims to increase the efficiency of training deep learning tasks with large size of datasets and models. As the computation capability of DDL nodes continues to increase,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-11 Zixuan Chen , Lei Shi , Xuandong Liu , Jiahui Li , Sen Liu , Yang Xu
‹ Prev 1 2 3 10 Next ›