English
Related papers

Related papers: Large-Scale Stochastic Learning using GPUs

200 papers

Deep learning has emerged as a powerful method for extracting valuable information from large volumes of data. However, when new training data arrives continuously (i.e., is not fully available from the beginning), incremental training…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-06 Thomas Bouvier , Bogdan Nicolae , Hugo Chaugier , Alexandru Costan , Ian Foster , Gabriel Antoniu

The ability to train large-scale neural networks has resulted in state-of-the-art performance in many areas of computer vision. These results have largely come from computational break throughs of two forms: model parallelism, e.g. GPU…

Computer Vision and Pattern Recognition · Computer Science 2013-12-24 Thomas Paine , Hailin Jin , Jianchao Yang , Zhe Lin , Thomas Huang

Deep learning has led to tremendous advancements in the field of Artificial Intelligence. One caveat however is the substantial amount of compute needed to train these deep learning models. Training a benchmark dataset like ImageNet on a…

Machine Learning · Computer Science 2018-10-30 Karanbir Chahal , Manraj Singh Grover , Kuntal Dey

Scale of data and scale of computation infrastructures together enable the current deep learning renaissance. However, training large-scale deep architectures demands both algorithmic improvement and careful system configuration. In this…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-09-21 Shang-Xuan Zou , Chun-Yen Chen , Jui-Lin Wu , Chun-Nan Chou , Chia-Chin Tsao , Kuan-Chieh Tung , Ting-Wei Lin , Cheng-Lung Sung , Edward Y. Chang

Real-world node embedding applications often contain hundreds of billions of edges with high-dimension node features. Scaling node embedding systems to efficiently support these applications remains a challenging problem. In this paper we…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-08-19 Wanjing Wei , Yangzihao Wang , Pin Gao , Shijie Sun , Donghai Yu

We propose a generic algorithmic building block to accelerate training of machine learning models on heterogeneous compute systems. Our scheme allows to efficiently employ compute accelerators such as GPUs and FPGAs for the training of…

Machine Learning · Computer Science 2017-11-08 Celestine Dünner , Thomas Parnell , Martin Jaggi

Synchronized stochastic gradient descent (SGD) optimizers with data parallelism are widely used in training large-scale deep neural networks. Although using larger mini-batch sizes can improve the system scalability by reducing the…

We describe the multi-GPU gradient boosting algorithm implemented in the XGBoost library (https://github.com/dmlc/xgboost). Our algorithm allows fast, scalable training on multi-GPU systems with all of the features of the XGBoost library.…

Machine Learning · Computer Science 2018-07-02 Rory Mitchell , Andrey Adinets , Thejaswi Rao , Eibe Frank

Graph neural network (GNN) has been demonstrated to be a powerful model in many domains for its effectiveness in learning over graphs. To scale GNN training for large graphs, a widely adopted approach is distributed training which…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-04-19 Haiyang Lin , Mingyu Yan , Xiaocheng Yang , Mo Zou , Wenming Li , Xiaochun Ye , Dongrui Fan

Motivated by extreme multi-label classification applications, we consider training deep learning models over sparse data in multi-GPU servers. The variance in the number of non-zero features across training batches and the intrinsic GPU…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-15 Yujing Ma , Florin Rusu , Kesheng Wu , Alexander Sim

With huge amounts of training data, deep learning has made great breakthroughs in many artificial intelligence (AI) applications. However, such large-scale data sets present computational challenges, requiring training to be distributed on…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-11-01 Shaohuai Shi , Qiang Wang , Xiaowen Chu , Bo Li

Transformer models trained on long sequences often achieve higher accuracy than short sequences. Unfortunately, conventional transformers struggle with long sequence training due to the overwhelming computation and memory requirements.…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-09 Xiao Wang , Isaac Lyngaas , Aristeidis Tsaris , Peng Chen , Sajal Dash , Mayanka Chandra Shekar , Tao Luo , Hong-Jun Yoon , Mohamed Wahib , John Gouley

Memory-based Temporal Graph Neural Networks are powerful tools in dynamic graph representation learning and have demonstrated superior performance in many real-world applications. However, their node memory favors smaller batch sizes to…

Machine Learning · Computer Science 2023-07-18 Hongkuan Zhou , Da Zheng , Xiang Song , George Karypis , Viktor Prasanna

With an increasing demand for training powers for deep learning algorithms and the rapid growth of computation resources in data centers, it is desirable to dynamically schedule different distributed deep learning tasks to maximize resource…

Machine Learning · Computer Science 2019-05-03 Haibin Lin , Hang Zhang , Yifei Ma , Tong He , Zhi Zhang , Sheng Zha , Mu Li

We present distributed algorithms for training dynamic Graph Neural Networks (GNN) on large scale graphs spanning multi-node, multi-GPU systems. To the best of our knowledge, this is the first scaling study on dynamic GNN. We devise…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-17 Venkatesan T. Chakaravarthy , Shivmaran S. Pandian , Saurabh Raje , Yogish Sabharwal , Toyotaro Suzumura , Shashanka Ubaru

Most Deep Reinforcement Learning (Deep RL) algorithms require a prohibitively large number of training samples for learning complex tasks. Many recent works on speeding up Deep RL have focused on distributed training and simulation. While…

Robotics · Computer Science 2018-10-25 Jacky Liang , Viktor Makoviychuk , Ankur Handa , Nuttapong Chentanez , Miles Macklin , Dieter Fox

Graph foundation models have demonstrated remarkable adaptability across diverse downstream tasks through large-scale pretraining on graphs. However, existing implementations of the backbone model, graph transformers, are typically limited…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-21 Jun-Liang Lin , Kamesh Madduri , Mahmut Taylan Kandemir

Distributed synchronized GPU training is commonly used for deep learning. The resource constraint of using a fixed number of GPUs makes large-scale training jobs suffer from long queuing time for resource allocation, and lowers the cluster…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-08 Mingzhen Li , Wencong Xiao , Biao Sun , Hanyu Zhao , Hailong Yang , Shiru Ren , Zhongzhi Luan , Xianyan Jia , Yi Liu , Yong Li , Wei Lin , Depei Qian

We propose a GPU-accelerated distributed optimization algorithm for controlling multi-phase optimal power flow in active distribution systems with dynamically changing topologies. To handle varying network configurations and enable…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-15 Minseok Ryu , Geunyeong Byeon , Kibaek Kim

The widely-adopted practice is to train deep learning models with specialized hardware accelerators, e.g., GPUs or TPUs, due to their superior performance on linear algebra operations. However, this strategy does not employ effectively the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-21 Yujing Ma , Florin Rusu
‹ Prev 1 2 3 10 Next ›