Related papers: SplitBrain: Hybrid Data and Model Parallel Deep Le…

Split Learning for Distributed Collaborative Training of Deep Learning Models in Health Informatics

Deep learning continues to rapidly evolve and is now demonstrating remarkable potential for numerous medical prediction tasks. However, realizing deep learning models that generalize across healthcare organizations is challenging. This is…

Machine Learning · Computer Science 2023-08-23 Zhuohang Li , Chao Yan , Xinmeng Zhang , Gharib Gharibi , Zhijun Yin , Xiaoqian Jiang , Bradley A. Malin

Model-Parallel Model Selection for Deep Learning Systems

As deep learning becomes more expensive, both in terms of time and compute, inefficiencies in machine learning (ML) training prevent practical usage of state-of-the-art models for most users. The newest model architectures are simply too…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-15 Kabir Nagrecha

Resource-efficient Parallel Split Learning in Heterogeneous Edge Computing

Edge AI has been recently proposed to facilitate the training and deployment of Deep Neural Network (DNN) models in proximity to the sources of data. To enable the training of large models on resource-constraint edge devices and protect…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-03-26 Mingjin Zhang , Jiannong Cao , Yuvraj Sahni , Xiangchun Chen , Shan Jiang

OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning

Large-scale deep learning models contribute to significant performance improvements on varieties of downstream tasks. Current data and model parallelism approaches utilize model replication and partition techniques to support the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-19 Youhe Jiang , Fangcheng Fu , Xupeng Miao , Xiaonan Nie , Bin Cui

Optimizing Distributed Training Approaches for Scaling Neural Networks

This paper presents a comparative analysis of distributed training strategies for large-scale neural networks, focusing on data parallelism, model parallelism, and hybrid approaches. We evaluate these strategies on image classification…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-01 Vishnu Vardhan Baligodugula , Fathi Amsaad

OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning

Large-scale deep learning models contribute to significant performance improvements on varieties of downstream tasks. Current data and model parallelism approaches utilize model replication and partition techniques to support the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-22 Youhe Jiang , Fangcheng Fu , Xupeng Miao , Xiaonan Nie , Bin Cui

A Survey and Empirical Evaluation of Parallel Deep Learning Frameworks

The field of deep learning has witnessed a remarkable shift towards extremely compute- and memory-intensive neural networks. These newer larger models have enabled researchers to advance state-of-the-art tools across a variety of fields.…

Machine Learning · Computer Science 2022-07-04 Daniel Nichols , Siddharth Singh , Shu-Huai Lin , Abhinav Bhatele

The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism

We present scalable hybrid-parallel algorithms for training large-scale 3D convolutional neural networks. Deep learning-based emerging scientific workflows often require model training with large, high-dimensional samples, which can make…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-28 Yosuke Oyama , Naoya Maruyama , Nikoli Dryden , Erin McCarthy , Peter Harrington , Jan Balewski , Satoshi Matsuoka , Peter Nugent , Brian Van Essen

Distributed Deep Learning using Stochastic Gradient Staleness

Despite the notable success of deep neural networks (DNNs) in solving complex tasks, the training process still remains considerable challenges. A primary obstacle is the substantial time required for training, particularly as high…

Machine Learning · Computer Science 2025-09-09 Viet Hoang Pham , Hyo-Sung Ahn

Split Learning for collaborative deep learning in healthcare

Shortage of labeled data has been holding the surge of deep learning in healthcare back, as sample sizes are often small, patient information cannot be shared openly, and multi-center collaborative studies are a burden to set up.…

Machine Learning · Computer Science 2019-12-30 Maarten G. Poirot , Praneeth Vepakomma , Ken Chang , Jayashree Kalpathy-Cramer , Rajiv Gupta , Ramesh Raskar

Partitioned Neural Network Training via Synthetic Intermediate Labels

The proliferation of extensive neural network architectures, particularly deep learning models, presents a challenge in terms of resource-intensive training. GPU memory constraints have become a notable bottleneck in training such sizable…

Machine Learning · Computer Science 2025-02-07 Cevat Volkan Karadağ , Nezih Topaloğlu

Spatio-Temporal Split Learning

This paper proposes a novel split learning framework with multiple end-systems in order to realize privacypreserving deep neural network computation. In conventional split learning frameworks, deep neural network computation is separated…

Machine Learning · Computer Science 2021-08-16 Joongheon Kim , Seunghoon Park , Soyi Jung , Seehwan Yoo

Breaking the Memory Wall for Heterogeneous Federated Learning via Model Splitting

Federated Learning (FL) enables multiple devices to collaboratively train a shared model while preserving data privacy. Ever-increasing model complexity coupled with limited memory resources on the participating devices severely bottlenecks…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-16 Chunlin Tian , Li Li , Kahou Tam , Yebo Wu , Chengzhong Xu

Split learning for health: Distributed deep learning without sharing raw patient data

Can health entities collaboratively train deep learning models without sharing sensitive raw data? This paper proposes several configurations of a distributed deep learning method called SplitNN to facilitate such collaborations. SplitNN…

Machine Learning · Computer Science 2018-12-04 Praneeth Vepakomma , Otkrist Gupta , Tristan Swedish , Ramesh Raskar

SuperSFL: Resource-Heterogeneous Federated Split Learning with Weight-Sharing Super-Networks

SplitFed Learning (SFL) combines federated learning and split learning to enable collaborative training across distributed edge devices; however, it faces significant challenges in heterogeneous environments with diverse computational and…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-27 Abdullah Al Asif , Sixing Yu , Juan Pablo Munoz , Arya Mazaheri , Ali Jannesari

Workflow Optimization for Parallel Split Learning

Split learning (SL) has been recently proposed as a way to enable resource-constrained devices to train multi-parameter neural networks (NNs) and participate in federated learning (FL). In a nutshell, SL splits the NN model into parts, and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-06 Joana Tirana , Dimitra Tsigkari , George Iosifidis , Dimitris Chatzopoulos

DAPPLE: A Pipelined Data Parallel Approach for Training Large Models

It is a challenging task to train large DNN models on sophisticated GPU platforms with diversified interconnect capabilities. Recently, pipelined training has been proposed as an effective approach for improving device utilization. However,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-03 Shiqing Fan , Yi Rong , Chen Meng , Zongyan Cao , Siyu Wang , Zhen Zheng , Chuan Wu , Guoping Long , Jun Yang , Lixue Xia , Lansong Diao , Xiaoyong Liu , Wei Lin

A Data and Model-Parallel, Distributed and Scalable Framework for Training of Deep Networks in Apache Spark

Training deep networks is expensive and time-consuming with the training period increasing with data size and growth in model parameters. In this paper, we provide a framework for distributed training of deep networks over a cluster of CPUs…

Machine Learning · Statistics 2017-08-22 Disha Shrivastava , Santanu Chaudhury , Dr. Jayadeva

Effectively Heterogeneous Federated Learning: A Pairing and Split Learning Based Approach

As a promising paradigm federated Learning (FL) is widely used in privacy-preserving machine learning, which allows distributed devices to collaboratively train a model while avoiding data transmission among clients. Despite its immense…

Machine Learning · Computer Science 2023-08-29 Jinglong Shen , Xiucheng Wang , Nan Cheng , Longfei Ma , Conghao Zhou , Yuan Zhang

Split Learning over Wireless Networks: Parallel Design and Resource Management

Split learning (SL) is a collaborative learning framework, which can train an artificial intelligence (AI) model between a device and an edge server by splitting the AI model into a device-side model and a server-side model at a cut layer.…

Networking and Internet Architecture · Computer Science 2023-01-03 Wen Wu , Mushu Li , Kaige Qu , Conghao Zhou , Xuemin , Shen , Weihua Zhuang , Xu Li , Weisen Shi