Related papers: Model-Parallel Model Selection for Deep Learning S…

Hydra: A System for Large Multi-Model Deep Learning

Scaling up model depth and size is now a common approach to raise accuracy in many deep learning (DL) applications, as evidenced by the widespread success of multi-billion or even trillion parameter models in natural language processing…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-05 Kabir Nagrecha , Arun Kumar

SplitBrain: Hybrid Data and Model Parallel Deep Learning

The recent success of deep learning applications has coincided with those widely available powerful computational resources for training sophisticated machine learning models with huge datasets. Nonetheless, training large models such as…

Machine Learning · Computer Science 2022-01-03 Farley Lai , Asim Kadav , Erik Kruus

Hydra: A Peer to Peer Distributed Training & Data Collection Framework

The world needs diverse and unbiased data to train deep learning models. Currently data comes from a variety of sources that are unmoderated to a large extent. The outcomes of training neural networks with unverified data yields biased…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-11-27 Vaibhav Mathur , Karanbir Chahal

Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training

Deploying deep learning (DL) models across multiple compute devices to train large and complex models continues to grow in importance because of the demand for faster and more frequent training. Data parallelism (DP) is the most widely used…

Machine Learning · Computer Science 2022-11-08 Saptadeep Pal , Eiman Ebrahimi , Arslan Zulfiqar , Yaosheng Fu , Victor Zhang , Szymon Migacz , David Nellans , Puneet Gupta

Hydra: Dual Exponentiated Memory for Multivariate Time Series Analysis

In recent years, effectively modeling multivariate time series has gained significant popularity, mainly due to its wide range of applications, ranging from healthcare to financial markets and energy management. Transformers, MLPs, and…

Machine Learning · Computer Science 2025-11-04 Asal Meskin , Alireza Mirrokni , Ali Najar , Ali Behrouz

SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient

Many deep learning applications benefit from using large models with billions of parameters. Training these models is notoriously expensive due to the need for specialized HPC clusters. In this work, we consider alternative setups for…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-30 Max Ryabinin , Tim Dettmers , Michael Diskin , Alexander Borzunov

Hybrid Approach to Parallel Stochastic Gradient Descent

Stochastic Gradient Descent is used for large datasets to train models to reduce the training time. On top of that data parallelism is widely used as a method to efficiently train neural networks using multiple worker nodes in parallel.…

Machine Learning · Computer Science 2024-07-02 Aakash Sudhirbhai Vora , Dhrumil Chetankumar Joshi , Aksh Kantibhai Patel

Hydra: Hybrid Server Power Model

With the growing complexity of big data workloads that require abundant data and computation, data centers consume a tremendous amount of power daily. In an effort to minimize data center power consumption, several studies developed power…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-07-22 Nigel Bernard , Hoa Nguyen , Aman Chandan , Savyasachi Jagdeeshan , Namdev Prabhugaonkar , Rutuja Shah , Hyeran Jeon

DHP: Efficient Scaling of MLLM Training with Dynamic Hybrid Parallelism

Scaling long-context capabilities is crucial for Multimodal Large Language Models (MLLMs). However, real-world multimodal datasets are extremely heterogeneous. Existing training frameworks predominantly rely on static parallelism…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-26 Yifan Niu , Han Xiao , Dongyi Liu , Wei Zhou , Jia Li

Gear Training: A new way to implement high-performance model-parallel training

The training of Deep Neural Networks usually needs tremendous computing resources. Therefore many deep models are trained in large cluster instead of single machine or GPU. Though major researchs at present try to run whole model on all…

Machine Learning · Computer Science 2018-06-12 Hao Dong , Shuai Li , Dongchang Xu , Yi Ren , Di Zhang

Systems for Parallel and Distributed Large-Model Deep Learning Training

Deep learning (DL) has transformed applications in a variety of domains, including computer vision, natural language processing, and tabular data analysis. The search for improved DL model accuracy has led practitioners to explore…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-10 Kabir Nagrecha

Automatic Model Parallelism for Deep Neural Networks with Compiler and Hardware Support

The deep neural networks (DNNs) have been enormously successful in tasks that were hitherto in the human-only realm such as image recognition, and language translation. Owing to their success the DNNs are being explored for use in ever more…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-20 Sanket Tavarageri , Srinivas Sridharan , Bharat Kaul

Primitives for Dynamic Big Model Parallelism

When training large machine learning models with many variables or parameters, a single machine is often inadequate since the model may be too large to fit in memory, while training can take a long time even with stochastic updates. A…

Machine Learning · Statistics 2014-06-19 Seunghak Lee , Jin Kyu Kim , Xun Zheng , Qirong Ho , Garth A. Gibson , Eric P. Xing

Optimizing Distributed Training Approaches for Scaling Neural Networks

This paper presents a comparative analysis of distributed training strategies for large-scale neural networks, focusing on data parallelism, model parallelism, and hybrid approaches. We evaluate these strategies on image classification…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-01 Vishnu Vardhan Baligodugula , Fathi Amsaad

Scaling Distributed Deep Learning Workloads beyond the Memory Capacity with KARMA

The dedicated memory of hardware accelerators can be insufficient to store all weights and/or intermediate states of large deep learning models. Although model parallelism is a viable approach to reduce the memory pressure issue,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-08-27 Mohamed Wahib , Haoyu Zhang , Truong Thao Nguyen , Aleksandr Drozd , Jens Domke , Lingqi Zhang , Ryousei Takano , Satoshi Matsuoka

Two-dimensional Sparse Parallelism for Large Scale Deep Learning Recommendation Model Training

The increasing complexity of deep learning recommendation models (DLRM) has led to a growing need for large-scale distributed systems that can efficiently train vast amounts of data. In DLRM, the sparse embedding table is a crucial…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-07 Xin Zhang , Quanyu Zhu , Liangbei Xu , Zain Huda , Wang Zhou , Jin Fang , Dennis van der Staay , Yuxi Hu , Jade Nie , Jiyan Yang , Chunzhi Yang

Hulk: Graph Neural Networks for Optimizing Regionally Distributed Computing Systems

Large deep learning models have shown great potential for delivering exceptional results in various applications. However, the training process can be incredibly challenging due to the models' vast parameter sizes, often consisting of…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-14 Zhengqing Yuan , Huiwen Xue , Chao Zhang , Yongming Liu

Research on Model Parallelism and Data Parallelism Optimization Methods in Large Language Model-Based Recommendation Systems

With the rapid adoption of large language models (LLMs) in recommendation systems, the computational and communication bottlenecks caused by their massive parameter sizes and large data volumes have become increasingly prominent. This paper…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-25 Haowei Yang , Yu Tian , Zhongheng Yang , Zhao Wang , Chengrui Zhou , Dannier Li

A Survey and Empirical Evaluation of Parallel Deep Learning Frameworks

The field of deep learning has witnessed a remarkable shift towards extremely compute- and memory-intensive neural networks. These newer larger models have enabled researchers to advance state-of-the-art tools across a variety of fields.…

Machine Learning · Computer Science 2022-07-04 Daniel Nichols , Siddharth Singh , Shu-Huai Lin , Abhinav Bhatele

Model Parallelism on Distributed Infrastructure: A Literature Review from Theory to LLM Case-Studies

Neural networks have become a cornerstone of machine learning. As the trend for these to get more and more complex continues, so does the underlying hardware and software infrastructure for training and deployment. In this survey we answer…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-03-07 Felix Brakel , Uraz Odyurt , Ana-Lucia Varbanescu