Related papers: Homomorphic Parameter Compression for Distributed …

Accelerating Distributed Deep Learning using Lossless Homomorphic Compression

As deep neural networks (DNNs) grow in complexity and size, the resultant increase in communication overhead during distributed training has become a significant bottleneck, challenging the scalability of distributed training systems.…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-13 Haoyu Li , Yuchen Xu , Jiayi Chen , Rohit Dwivedula , Wenfei Wu , Keqiang He , Aditya Akella , Daehyeok Kim

Reducing Data Motion to Accelerate the Training of Deep Neural Networks

This paper reduces the cost of DNNs training by decreasing the amount of data movement across heterogeneous architectures composed of several GPUs and multicore CPU devices. In particular, this paper proposes an algorithm to dynamically…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-07 Sicong Zhuang , Cristiano Malossi , Marc Casas

Slim-DP: A Light Communication Data Parallelism for DNN

Data parallelism has emerged as a necessary technique to accelerate the training of deep neural networks (DNN). In a typical data parallelism approach, the local workers push the latest updates of all the parameters to the parameter server…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-09-28 Shizhao Sun , Wei Chen , Jiang Bian , Xiaoguang Liu , Tie-Yan Liu

Communication-Efficient Distributed Deep Learning: A Comprehensive Survey

Distributed deep learning (DL) has become prevalent in recent years to reduce training time by leveraging multiple computing devices (e.g., GPUs/TPUs) due to larger models and datasets. However, system scalability is limited by…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-09-04 Zhenheng Tang , Shaohuai Shi , Wei Wang , Bo Li , Xiaowen Chu

THC: Accelerating Distributed Deep Learning Using Tensor Homomorphic Compression

Deep neural networks (DNNs) are the de facto standard for essential use cases, such as image classification, computer vision, and natural language processing. As DNNs and datasets get larger, they require distributed training on…

Machine Learning · Computer Science 2024-03-07 Minghao Li , Ran Ben Basat , Shay Vargaftik , ChonLam Lao , Kevin Xu , Michael Mitzenmacher , Minlan Yu

Efficient Compression of Overparameterized Deep Models through Low-Dimensional Learning Dynamics

Overparameterized models have proven to be powerful tools for solving various machine learning tasks. However, overparameterization often leads to a substantial increase in computational and memory costs, which in turn requires extensive…

Machine Learning · Computer Science 2024-03-13 Soo Min Kwon , Zekai Zhang , Dogyoon Song , Laura Balzano , Qing Qu

On Effects of Compression with Hyperdimensional Computing in Distributed Randomized Neural Networks

A change of the prevalent supervised learning techniques is foreseeable in the near future: from the complex, computational expensive algorithms to more flexible and elementary training ones. The strong revitalization of randomized…

Machine Learning · Computer Science 2022-09-02 Antonello Rosato , Massimo Panella , Evgeny Osipov , Denis Kleyko

Priority-based Parameter Propagation for Distributed DNN Training

Data parallel training is widely used for scaling distributed deep neural network (DNN) training. However, the performance benefits are often limited by the communication-heavy parameter synchronization step. In this paper, we take…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-05-13 Anand Jayarajan , Jinliang Wei , Garth Gibson , Alexandra Fedorova , Gennady Pekhimenko

Does compressing activations help model parallel training?

Large-scale Transformer models are known for their exceptional performance in a range of tasks, but training them can be difficult due to the requirement for communication-intensive model parallelism. One way to improve training speed is to…

Machine Learning · Computer Science 2023-01-09 Song Bian , Dacheng Li , Hongyi Wang , Eric P. Xing , Shivaram Venkataraman

Distributed Training of Large Graph Neural Networks with Variable Communication Rates

Training Graph Neural Networks (GNNs) on large graphs presents unique challenges due to the large memory and computing requirements. Distributed GNN training, where the graph is partitioned across multiple machines, is a common approach to…

Machine Learning · Computer Science 2024-06-26 Juan Cervino , Md Asadullah Turja , Hesham Mostafa , Nageen Himayat , Alejandro Ribeiro

Communication Compression for Decentralized Training

Optimizing distributed learning systems is an art of balancing between computation and communication. There have been two lines of research that try to deal with slower networks: {\em communication compression} for low bandwidth networks,…

Machine Learning · Computer Science 2019-02-04 Hanlin Tang , Shaoduo Gan , Ce Zhang , Tong Zhang , Ji Liu

Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training

Distributed deep neural network (DDNN) training constitutes an increasingly important workload that frequently runs in the cloud. Larger DNN models and faster compute engines are shifting DDNN training bottlenecks from computation to…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-22 Liang Luo , Jacob Nelson , Luis Ceze , Amar Phanishayee , Arvind Krishnamurthy

SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient

Many deep learning applications benefit from using large models with billions of parameters. Training these models is notoriously expensive due to the need for specialized HPC clusters. In this work, we consider alternative setups for…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-30 Max Ryabinin , Tim Dettmers , Michael Diskin , Alexander Borzunov

Training Deep Neural Networks by optimizing over nonlocal paths in hyperparameter space

Hyperparameter optimization is both a practical issue and an interesting theoretical problem in training of deep architectures. Despite many recent advances the most commonly used methods almost universally involve training multiple and…

Machine Learning · Computer Science 2019-09-10 Vlad Pushkarov , Jonathan Efroni , Mykola Maksymenko , Maciej Koch-Janusz

Distributed Deep Learning using Stochastic Gradient Staleness

Despite the notable success of deep neural networks (DNNs) in solving complex tasks, the training process still remains considerable challenges. A primary obstacle is the substantial time required for training, particularly as high…

Machine Learning · Computer Science 2025-09-09 Viet Hoang Pham , Hyo-Sung Ahn

Distributed Training and Optimization Of Neural Networks

Deep learning models are yielding increasingly better performances thanks to multiple factors. To be successful, model may have large number of parameters or complex architectures and be trained on large dataset. This leads to large…

Machine Learning · Computer Science 2022-12-20 Jean-Roch Vlimant , Junqi Yin

Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression

In training of modern large natural language processing (NLP) models, it has become a common practice to split models using 3D parallelism to multiple GPUs. Such technique, however, suffers from a high overhead of inter-node communication.…

Machine Learning · Computer Science 2023-01-25 Jaeyong Song , Jinkyu Yim , Jaewon Jung , Hongsun Jang , Hyung-Jin Kim , Youngsok Kim , Jinho Lee

HeterPS: Distributed Deep Learning With Reinforcement Learning Based Scheduling in Heterogeneous Environments

Deep neural networks (DNNs) exploit many layers and a large number of parameters to achieve excellent performance. The training process of DNN models generally handles large-scale input data with many sparse features, which incurs high…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-08 Ji Liu , Zhihua Wu , Dianhai Yu , Yanjun Ma , Danlei Feng , Minxu Zhang , Xinxuan Wu , Xuefeng Yao , Dejing Dou

Deep Hierarchy Quantization Compression algorithm based on Dynamic Sampling

Unlike traditional distributed machine learning, federated learning stores data locally for training and then aggregates the models on the server, which solves the data security problem that may arise in traditional distributed machine…

Machine Learning · Computer Science 2023-01-02 Wan Jiang , Gang Liu , Xiaofeng Chen , Yipeng Zhou

Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism

Scaling models has led to significant advancements in deep learning, but training these models in decentralized settings remains challenging due to communication bottlenecks. While existing compression techniques are effective in…

Machine Learning · Computer Science 2025-06-03 Sameera Ramasinghe , Thalaiyasingam Ajanthan , Gil Avraham , Yan Zuo , Alexander Long