Related papers: ChainerMN: Scalable Distributed Deep Learning Fram…

Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters

Distributed training techniques have been widely deployed in large-scale deep neural networks (DNNs) training on dense-GPU clusters. However, on public cloud clusters, due to the moderate inter-connection bandwidth between instances,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-21 Shaohuai Shi , Xianhao Zhou , Shutao Song , Xingyao Wang , Zilin Zhu , Xue Huang , Xinan Jiang , Feihu Zhou , Zhenyu Guo , Liqiang Xie , Rui Lan , Xianbin Ouyang , Yan Zhang , Jieqian Wei , Jing Gong , Weiliang Lin , Ping Gao , Peng Meng , Xiaomin Xu , Chenyang Guo , Bo Yang , Zhibo Chen , Yongjian Wu , Xiaowen Chu

Chainer: A Deep Learning Framework for Accelerating the Research Cycle

Software frameworks for neural networks play a key role in the development and application of deep learning methods. In this paper, we introduce the Chainer framework, which intends to provide a flexible, intuitive, and high performance…

Machine Learning · Computer Science 2019-08-02 Seiya Tokui , Ryosuke Okuta , Takuya Akiba , Yusuke Niitani , Toru Ogawa , Shunta Saito , Shuji Suzuki , Kota Uenishi , Brian Vogel , Hiroyuki Yamazaki Vincent

Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs

Deep learning frameworks have been widely deployed on GPU servers for deep learning applications in both academia and industry. In training deep neural networks (DNNs), there are many standard processes or algorithms, such as convolution…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-08-21 Shaohuai Shi , Qiang Wang , Xiaowen Chu

Characterizing and Understanding Distributed GNN Training on GPUs

Graph neural network (GNN) has been demonstrated to be a powerful model in many domains for its effectiveness in learning over graphs. To scale GNN training for large graphs, a widely adopted approach is distributed training which…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-04-19 Haiyang Lin , Mingyu Yan , Xiaocheng Yang , Mo Zou , Wenming Li , Xiaochun Ye , Dongrui Fan

Theano-MPI: a Theano-based Distributed Training Framework

We develop a scalable and extendable training framework that can utilize GPUs across nodes in a cluster and accelerate the training of deep learning models based on data parallelism. Both synchronous and asynchronous training are…

Machine Learning · Computer Science 2016-05-27 He Ma , Fei Mao , Graham W. Taylor

Hyper: Distributed Cloud Processing for Large-Scale Deep Learning Tasks

Training and deploying deep learning models in real-world applications require processing large amounts of data. This is a challenging task when the amount of data grows to a hundred terabytes, or even, petabyte-scale. We introduce a hybrid…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-17 Davit Buniatyan

Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash

Scaling the distributed deep learning to a massive GPU cluster level is challenging due to the instability of the large mini-batch training and the overhead of the gradient synchronization. We address the instability of the large mini-batch…

Machine Learning · Computer Science 2019-03-06 Hiroaki Mikami , Hisahiro Suganuma , Pongsakorn U-chupala , Yoshiki Tanaka , Yuichi Kageyama

Efficient Scaling of Dynamic Graph Neural Networks

We present distributed algorithms for training dynamic Graph Neural Networks (GNN) on large scale graphs spanning multi-node, multi-GPU systems. To the best of our knowledge, this is the first scaling study on dynamic GNN. We devise…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-17 Venkatesan T. Chakaravarthy , Shivmaran S. Pandian , Saurabh Raje , Yogish Sabharwal , Toyotaro Suzumura , Shashanka Ubaru

Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes

It is important to scale out deep neural network (DNN) training for reducing model training time. The high communication overhead is one of the major performance bottlenecks for distributed DNN training across multiple GPUs. Our…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-23 Peng Sun , Wansen Feng , Ruobing Han , Shengen Yan , Yonggang Wen

PowerAI DDL

As deep neural networks become more complex and input datasets grow larger, it can take days or even weeks to train a deep neural network to the desired accuracy. Therefore, distributed Deep Learning at a massive scale is a critical…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-08-08 Minsik Cho , Ulrich Finkler , Sameer Kumar , David Kung , Vaibhav Saxena , Dheeraj Sreedhar

Distributed Training of Deep Neural Networks with Theoretical Analysis: Under SSP Setting

We propose a distributed approach to train deep neural networks (DNNs), which has guaranteed convergence theoretically and great scalability empirically: close to 6 times faster on instance of ImageNet data set when run with 6 machines. The…

Machine Learning · Statistics 2016-10-04 Abhimanu Kumar , Pengtao Xie , Junming Yin , Eric P. Xing

AxoNN: An asynchronous, message-driven parallel framework for extreme-scale deep learning

In the last few years, the memory requirements to train state-of-the-art neural networks have far exceeded the DRAM capacities of modern hardware accelerators. This has necessitated the development of efficient algorithms to train these…

Machine Learning · Computer Science 2023-05-16 Siddharth Singh , Abhinav Bhatele

DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks

Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible. It is challenging due to large memory capacity and bandwidth…

Machine Learning · Computer Science 2021-04-19 Vasimuddin Md , Sanchit Misra , Guixiang Ma , Ramanarayan Mohanty , Evangelos Georganas , Alexander Heinecke , Dhiraj Kalamkar , Nesreen K. Ahmed , Sasikanth Avancha

FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression

To alleviate hardware scarcity in training large deep neural networks (DNNs), particularly large language models (LLMs), we present FusionLLM, a decentralized training system designed and implemented for training DNNs using geo-distributed…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-17 Zhenheng Tang , Xueze Kang , Yiming Yin , Xinglin Pan , Yuxin Wang , Xin He , Qiang Wang , Rongfei Zeng , Kaiyong Zhao , Shaohuai Shi , Amelie Chi Zhou , Bo Li , Bingsheng He , Xiaowen Chu

Training Distributed Deep Recurrent Neural Networks with Mixed Precision on GPU Clusters

In this paper, we evaluate training of deep recurrent neural networks with half-precision floats. We implement a distributed, data-parallel, synchronous training algorithm by integrating TensorFlow and CUDA-aware MPI to enable execution…

Machine Learning · Computer Science 2019-12-03 Alexey Svyatkovskiy , Julian Kates-Harbeck , William Tang

GSR-GNN: Training Acceleration and Memory-Saving Framework of Deep GNNs on Circuit Graph

Graph Neural Networks (GNNs) show strong promise for circuit analysis, but scaling to modern large-scale circuit graphs is limited by GPU memory and training cost, especially for deep models. We revisit deep GNNs for circuit graphs and show…

Machine Learning · Computer Science 2026-03-31 Yuebo Luo , Shiyang Li , Yifei Feng , Vishal Kancharla , Shaoyi Huang , Caiwen Ding

Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation

Most research on novel techniques for 3D Medical Image Segmentation (MIS) is currently done using Deep Learning with GPU accelerators. The principal challenge of such technique is that a single input can easily cope computing resources, and…

Machine Learning · Computer Science 2021-11-01 Josep Lluis Berral , Oriol Aranda , Juan Luis Dominguez , Jordi Torres

A Hitchhiker's Guide On Distributed Training of Deep Neural Networks

Deep learning has led to tremendous advancements in the field of Artificial Intelligence. One caveat however is the substantial amount of compute needed to train these deep learning models. Training a benchmark dataset like ImageNet on a…

Machine Learning · Computer Science 2018-10-30 Karanbir Chahal , Manraj Singh Grover , Kuntal Dey

Multi-Resolution Graph Neural Network for Large-Scale Pointcloud Segmentation

In this paper, we propose a multi-resolution deep-learning architecture to semantically segment dense large-scale pointclouds. Dense pointcloud data require a computationally expensive feature encoding process before semantic segmentation.…

Computer Vision and Pattern Recognition · Computer Science 2021-01-25 Liuyue Xie , Tomotake Furuhata , Kenji Shimada

Parallelizing Training of Deep Generative Models on Massive Scientific Datasets

Training deep neural networks on large scientific data is a challenging task that requires enormous compute power, especially if no pre-trained models exist to initialize the process. We present a novel tournament method to train…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-08 Sam Ade Jacobs , Brian Van Essen , David Hysom , Jae-Seung Yeom , Tim Moon , Rushil Anirudh , Jayaraman J. Thiagaranjan , Shusen Liu , Peer-Timo Bremer , Jim Gaffney , Tom Benson , Peter Robinson , Luc Peterson , Brian Spears