English
Related papers

Related papers: A Distributed Multi-GPU System for Large-Scale Nod…

200 papers

Learning continuous representations of nodes is attracting growing interest in both academia and industry recently, due to their simplicity and effectiveness in a variety of applications. Most of existing node embedding algorithms and…

Machine Learning · Computer Science 2019-03-05 Zhaocheng Zhu , Shizhen Xu , Meng Qu , Jian Tang

In this work we propose an accelerated stochastic learning system for very large-scale applications. Acceleration is achieved by mapping the training algorithm onto massively parallel processors: we demonstrate a parallel, asynchronous GPU…

Machine Learning · Computer Science 2017-02-24 Thomas Parnell , Celestine Dünner , Kubilay Atasu , Manolis Sifalakis , Haris Pozidis

Graph embedding techniques have attracted growing interest since they convert the graph data into continuous and low-dimensional space. Effective graph analytic provides users a deeper understanding of what is behind the data and thus can…

Machine Learning · Computer Science 2022-01-21 Azita Nouri , Philip E. Davis , Pradeep Subedi , Manish Parashar

Graphs are a representation of structured data that captures the relationships between sets of objects. With the ubiquity of available network data, there is increasing industrial and academic need to quickly analyze graphs with billions of…

Machine Learning · Computer Science 2023-07-28 Brandon Mayer , Anton Tsitsulin , Hendrik Fichtenberger , Jonathan Halcrow , Bryan Perozzi

Distributed training techniques have been widely deployed in large-scale deep neural networks (DNNs) training on dense-GPU clusters. However, on public cloud clusters, due to the moderate inter-connection bandwidth between instances,…

The Graph Convolutional Network (GCN) model and its variants are powerful graph embedding tools for facilitating classification and clustering on graphs. However, a major challenge is to reduce the complexity of layered GCNs and make them…

Machine Learning · Computer Science 2020-08-06 Hanqing Zeng , Hongkuan Zhou , Ajitesh Srivastava , Rajgopal Kannan , Viktor Prasanna

Graph embedding aims at learning a vector-based representation of vertices that incorporates the structure of the graph. This representation then enables inference of graph properties. Existing graph embedding techniques, however, do not…

Hash tables are used in a plethora of applications, including database operations, DNA sequencing, string searching, and many more. As such, there are many parallelized hash tables targeting multicore, distributed, and accelerator-based…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-05 Alok Tripathy , Oded Green

The edge computing paradigm has emerged to handle cloud computing issues such as scalability, security and low response time among others. This new computing trend heavily relies on ubiquitous embedded systems on the edge. Performance and…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-28 Mohammad Hosseinabady , Mohd Amiruddin Bin Zainol , Jose Nunez-Yanez

Graph Neural Networks (GNNs) have shown success in many real-world applications that involve graph-structured data. Most of the existing single-node GNN training systems are capable of training medium-scale graphs with tens of millions of…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-03-02 Yi-Chien Lin , Viktor Prasanna

Large-scale AI model training divides work across thousands of GPUs, then synchronizes gradients across them at each step. This incurs a significant network burden that only centralized, monolithic clusters can support, driving up…

Computer Vision and Pattern Recognition · Computer Science 2025-01-13 David McAllister , Matthew Tancik , Jiaming Song , Angjoo Kanazawa

Diffusion models have achieved great success in synthesizing high-quality images. However, generating high-resolution images with diffusion models is still challenging due to the enormous computational costs, resulting in a prohibitive…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Muyang Li , Tianle Cai , Jiaxin Cao , Qinsheng Zhang , Han Cai , Junjie Bai , Yangqing Jia , Ming-Yu Liu , Kai Li , Song Han

Training large-scale models relies on a vast number of computing resources. For example, training the GPT-4 model (1.8 trillion parameters) requires 25000 A100 GPUs . It is a challenge to build a large-scale cluster with one type of…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-08-12 Si Xu , Zixiao Huang , Yan Zeng , Shengen Yan , Xuefei Ning , Quanlu Zhang , Haolin Ye , Sipei Gu , Chunsheng Shui , Zhezheng Lin , Hao Zhang , Sheng Wang , Guohao Dai , Yu Wang

Temporal Interaction Graphs (TIGs) are widely employed to model intricate real-world systems such as financial systems and social networks. To capture the dynamism and interdependencies of nodes, existing TIG embedding models need to…

Machine Learning · Computer Science 2023-09-12 Xi Chen , Yongxiang Liao , Yun Xiong , Yao Zhang , Siwei Zhang , Jiawei Zhang , Yiheng Sun

GPUs have significantly accelerated first-order methods for large-scale optimization, especially in continuous optimization. However, this success has not transferred cleanly to problems with discrete variables, combinatorial structure, and…

Machine Learning · Computer Science 2026-05-22 Jiachang Liu , Andrea Lodi

Graph embeddings map graph nodes to continuous vectors and are foundational to community detection, recommendation, and many scientific applications. At billion-scale, however, existing graph embedding systems face a trade-off: they either…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-13 Zhonggen Li , Xiangyu Ke , Yifan Zhu , Yunjun Gao , Feifei Li

Large language models have led to state-of-the-art accuracies across a range of tasks. However, training these models efficiently is challenging for two reasons: a) GPU memory capacity is limited, making it impossible to fit large models on…

In this paper, we evaluate training of deep recurrent neural networks with half-precision floats. We implement a distributed, data-parallel, synchronous training algorithm by integrating TensorFlow and CUDA-aware MPI to enable execution…

Machine Learning · Computer Science 2019-12-03 Alexey Svyatkovskiy , Julian Kates-Harbeck , William Tang

Graph embedding is a popular algorithmic approach for creating vector representations for individual vertices in networks. Training these algorithms at scale is important for creating embeddings that can be used for classification, ranking,…

Machine Learning · Computer Science 2019-07-04 C. Bayan Bruss , Anish Khazane , Jonathan Rider , Richard Serpe , Saurabh Nagrecha , Keegan E. Hines

Edge-centric distributed computations have appeared as a recent technique to improve the shortcomings of think-like-a-vertex algorithms on large scale-free networks. In order to increase parallelism on this model, edge partitioning -…

Data Structures and Algorithms · Computer Science 2018-10-12 Sebastian Schlag , Christian Schulz , Daniel Seemaier , Darren Strash
‹ Prev 1 2 3 10 Next ›