English
Related papers

Related papers: AxoNN: An asynchronous, message-driven parallel fr…

200 papers

Parallel training of neural networks at scale is challenging due to significant overheads arising from communication. Recently, deep learning researchers have developed a variety of pruning algorithms that are capable of pruning (i.e.…

Machine Learning · Computer Science 2023-05-16 Siddharth Singh , Abhinav Bhatele

Heavy communication, in particular, collective operations, can become a critical performance bottleneck in scaling the training of billion-parameter neural networks to large-scale parallel systems. This paper introduces a four-dimensional…

Machine Learning · Computer Science 2024-05-15 Siddharth Singh , Prajwal Singhania , Aditya K. Ranjan , Zack Sating , Abhinav Bhatele

Typically, Ultra-deep neural network(UDNN) tends to yield high-quality model, but its training process is usually resource intensive and time-consuming. Modern GPU's scarce DRAM capacity is the primary bottleneck that hinders the…

Machine Learning · Computer Science 2019-06-21 Jinrong Guo , Wantao Liu , Wang Wang , Qu Lu , Songlin Hu , Jizhong Han , Ruixuan Li

Training and fine-tuning large language models (LLMs) with hundreds of billions to trillions of parameters requires tens of thousands of GPUs, and a highly scalable software stack. In this work, we present a novel four-dimensional hybrid…

Deep learning (DL) has achieved notable successes in many machine learning tasks. A number of frameworks have been developed to expedite the process of designing and training deep neural networks (DNNs), such as Caffe, Torch and Theano.…

Machine Learning · Computer Science 2015-12-22 Hao Zhang , Zhiting Hu , Jinliang Wei , Pengtao Xie , Gunhee Kim , Qirong Ho , Eric Xing

Graph Neural Networks (GNNs) are powerful deep learning models to generate node embeddings on graphs. When applying deep GNNs on large graphs, it is still challenging to perform training in an efficient and scalable way. We propose a novel…

Machine Learning · Computer Science 2020-10-08 Hanqing Zeng , Hongkuan Zhou , Ajitesh Srivastava , Rajgopal Kannan , Viktor Prasanna

Graph-structured data is ubiquitous in the real world, and Graph Neural Networks (GNNs) have become increasingly popular in various fields due to their ability to process such irregular data directly. However, as data scale, GNNs become…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-10 Xianfeng Song , Yi Zou , Zheng Shi

The most widely used machine learning frameworks require users to carefully tune their memory usage so that the deep neural network (DNN) fits into the DRAM capacity of a GPU. This restriction hampers a researcher's flexibility to study…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-08-01 Minsoo Rhu , Natalia Gimelshein , Jason Clemons , Arslan Zulfiqar , Stephen W. Keckler

Graph Neural Networks (GNNs) have shown great superiority on non-Euclidean graph data, achieving ground-breaking performance on various graph-related tasks. As a practical solution to train GNN on large graphs with billions of nodes and…

Machine Learning · Computer Science 2024-09-24 Zeyu Zhu , Peisong Wang , Qinghao Hu , Gang Li , Xiaoyao Liang , Jian Cheng

The training process of Deep Neural Network (DNN) is compute-intensive, often taking days to weeks to train a DNN model. Therefore, parallel execution of DNN training on GPUs is a widely adopted approach to speed up the process nowadays.…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-29 Chi-Chung Chen , Chia-Lin Yang , Hsiang-Yun Cheng

GPUs are currently the platform of choice for training neural networks. However, training a deep neural network (DNN) is a time-consuming process even on GPUs because of the massive number of parameters that have to be learned. As a result,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-29 Behnam Pourghassemi , Chenghao Zhang , Joo Hwan Lee , Aparna Chandramowlishwaran

Graph neural networks (GNNs) have extended the success of deep neural networks (DNNs) to non-Euclidean graph data, achieving ground-breaking performance on various tasks such as node classification and graph property prediction.…

Machine Learning · Computer Science 2021-12-17 Tianfeng Liu , Yangrui Chen , Dan Li , Chuan Wu , Yibo Zhu , Jun He , Yanghua Peng , Hongzheng Chen , Hongzhi Chen , Chuanxiong Guo

In the acceleration of deep neural network training, the GPU has become the mainstream platform. GPUs face substantial challenges on GNNs, such as workload imbalance and memory access irregularities, leading to underutilized hardware.…

Machine Learning · Computer Science 2024-03-20 Hongwu Peng , Xi Xie , Kaustubh Shivdikar , MD Amit Hasan , Jiahui Zhao , Shaoyi Huang , Omer Khan , David Kaeli , Caiwen Ding

GPUs have been favored for training deep learning models due to their highly parallelized architecture. As a result, most studies on training optimization focus on GPUs. There is often a trade-off, however, between cost and efficiency when…

One of the keys for deep learning to have made a breakthrough in various fields was to utilize high computing powers centering around GPUs. Enabling the use of further computing abilities by distributed processing is essential not only to…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-01 Takuya Akiba , Keisuke Fukuda , Shuji Suzuki

It is important to scale out deep neural network (DNN) training for reducing model training time. The high communication overhead is one of the major performance bottlenecks for distributed DNN training across multiple GPUs. Our…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-23 Peng Sun , Wansen Feng , Ruobing Han , Shengen Yan , Yonggang Wen

Deep neural networks (DNN) have demonstrated effectiveness for various applications such as image processing, video segmentation, and speech recognition. Running state-of-the-art DNNs on current systems mostly relies on either…

Neural and Evolutionary Computing · Computer Science 2019-04-15 Mohsen Imani , Mohammad Samragh , Yeseong Kim , Saransh Gupta , Farinaz Koushanfar , Tajana Rosing

Modern deep learning workloads increasingly exhibit dynamic, metadata-driven execution, where runtime-generated information determines memory provisioning and kernel launch decisions. In sampling-based graph neural network (GNN) training,…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Yidong Gong , Saima Afrin , Yuchen Ma , Guannan Wang , Bin Ren , Pradeep Kumar

Graph Neural Networks (GNNs) have been widely adopted due to their strong performance. However, GNN training often relies on expensive, high-performance computing platforms, limiting accessibility for many tasks. Profiling of representative…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-12 Tong Qiao , Ao Zhou , Yingjie Qi , Yiou Wang , Han Wan , Jianlei Yang , Chunming Hu

As emerging deep neural network (DNN) models continue to grow in size, using large GPU clusters to train DNNs is becoming an essential requirement to achieving acceptable training times. In this paper, we consider the case where future…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-05-25 Seo Jin Park , Joshua Fried , Sunghyun Kim , Mohammad Alizadeh , Adam Belay
‹ Prev 1 2 3 10 Next ›