Related papers: Gunrock: GPU Graph Analytics

Gunrock: A High-Performance Graph Processing Library on the GPU

For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library.…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-02-24 Yangzihao Wang , Andrew Davidson , Yuechao Pan , Yuduo Wu , Andy Riffel , John D. Owens

Fast Gunrock Subgraph Matching (GSM) on GPUs

In this paper, we propose a GPU-efficient subgraph isomorphism algorithm using the Gunrock graph analytic framework, GSM (Gunrock Subgraph Matching), to compute graph matching on GPUs. In contrast to previous approaches on the CPU which are…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-12 Leyuan Wang , John D. Owens

Multi-GPU Graph Analytics

We present a single-node, multi-GPU programmable graph processing library that allows programmers to easily extend single-GPU graph algorithms to achieve scalable performance on large graphs with billions of edges. Directly using the…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-03-02 Yuechao Pan , Yangzihao Wang , Yuduo Wu , Carl Yang , John D. Owens

GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU

High-performance implementations of graph algorithms are challenging to implement on new parallel hardware such as GPUs because of three challenges: (1) the difficulty of coming up with graph building blocks, (2) load imbalance on parallel…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-16 Carl Yang , Aydin Buluc , John D. Owens

BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing

Graph neural networks (GNNs) have extended the success of deep neural networks (DNNs) to non-Euclidean graph data, achieving ground-breaking performance on various tasks such as node classification and graph property prediction.…

Machine Learning · Computer Science 2021-12-17 Tianfeng Liu , Yangrui Chen , Dan Li , Chuan Wu , Yibo Zhu , Jun He , Yanghua Peng , Hongzheng Chen , Hongzhi Chen , Chuanxiong Guo

Fast BFS-Based Triangle Counting on GPUs

In this paper, we propose a novel method to compute triangle counting on GPUs. Unlike previous formulations of graph matching, our approach is BFS-based by traversing the graph in an all-source-BFS manner and thus can be mapped onto GPUs in…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-06 Leyuan Wang , John D. Owens

Exploring the Limits of GPUs With Parallel Graph Algorithms

In this paper, we explore the limits of graphics processors (GPUs) for general purpose parallel computing by studying problems that require highly irregular data access patterns: parallel graph algorithms for list ranking and connected…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-02-25 Frank Dehne , Kumanan Yogaratnam

Faster GPU Based Genetic Programming Using A Two Dimensional Stack

Genetic Programming (GP) is a computationally intensive technique which also has a high degree of natural parallelism. Parallel computing architectures have become commonplace especially with regards Graphics Processing Units (GPU). Hence,…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-01-05 Darren M. Chitty

MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training

In the acceleration of deep neural network training, the GPU has become the mainstream platform. GPUs face substantial challenges on GNNs, such as workload imbalance and memory access irregularities, leading to underutilized hardware.…

Machine Learning · Computer Science 2024-03-20 Hongwu Peng , Xi Xie , Kaustubh Shivdikar , MD Amit Hasan , Jiahui Zhao , Shaoyi Huang , Omer Khan , David Kaeli , Caiwen Ding

GraphCage: Cache Aware Graph Processing on GPUs

Efficient Graph processing is challenging because of the irregularity of graph algorithms. Using GPUs to accelerate irregular graph algorithms is even more difficult to be efficient, since GPU's highly structured SIMT architecture is not a…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-04-05 Xuhao Chen

Vortex: Overcoming Memory Capacity Limitations in GPU-Accelerated Large-Scale Data Analytics

Despite the high computational throughput of GPUs, limited memory capacity and bandwidth-limited CPU-GPU communication via PCIe links remain significant bottlenecks for accelerating large-scale data analytics workloads. This paper…

Databases · Computer Science 2025-02-14 Yichao Yuan , Advait Iyer , Lin Ma , Nishil Talati

SIMD-X: Programming and Processing of Graph Algorithms on GPUs

With high computation power and memory bandwidth, graphics processing units (GPUs) lend themselves to accelerate data-intensive analytics, especially when such applications fit the single instruction multiple data (SIMD) model. However,…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-12-12 Hang Liu , H. Howie Huang

Swift: A Multi-FPGA Framework for Scaling Up Accelerated Graph Analytics

Graph analytics are vital in fields such as social networks, biomedical research, and graph neural networks (GNNs). However, traditional CPUs and GPUs struggle with the memory bottlenecks caused by large graph datasets and their…

Hardware Architecture · Computer Science 2024-11-25 Oluwole Jaiyeoba , Abdullah T. Mughrabi , Morteza Baradaran , Beenish Gul , Kevin Skadron

GraphScale: Scalable Bandwidth-Efficient Graph Processing on FPGAs

Recent advances in graph processing on FPGAs promise to alleviate performance bottlenecks with irregular memory access patterns. Such bottlenecks challenge performance for a growing number of important application areas like machine…

Hardware Architecture · Computer Science 2022-06-20 Jonas Dann , Daniel Ritter , Holger Fröning

FusionRCG: Orchestrating Recursive Computation Graphs across GPU Memory Hierarchies

Evaluating high-dimensional integrals via deep hierarchical recurrences is a dominant cost in quantum chemistry. While CPUs manage these efficiently, GPUs suffer a critical mismatch: limited per-thread memory is quickly overwhelmed by an…

Computational Physics · Physics 2026-05-14 Yihong Zhang , Xinran Wei , Junshi Chen , Fusong Ju , Wei Hu , Jinlong Yang , Huanhuan Xia

Technical Report: Accelerating Dynamic Graph Analytics on GPUs

As graph analytics often involves compute-intensive operations, GPUs have been extensively used to accelerate the processing. However, in many applications such as social networks, cyber security, and fraud detection, their representative…

Data Structures and Algorithms · Computer Science 2018-06-28 Mo Sha , Yuchen Li , Bingsheng He , Kian-Lee Tan

Performance Impact of Memory Channels on Sparse and Irregular Algorithms

Graph processing is typically considered to be a memory-bound rather than compute-bound problem. One common line of thought is that more available memory bandwidth corresponds to better graph processing performance. However, in this work we…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-10 Oded Green , James Fox , Jeffrey Young , Jun Shirako , David Bader

Deep Graph Library Optimizations for Intel(R) x86 Architecture

The Deep Graph Library (DGL) was designed as a tool to enable structure learning from graphs, by supporting a core abstraction for graphs, including the popular Graph Neural Networks (GNN). DGL contains implementations of all core graph…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-14 Sasikanth Avancha , Vasimuddin Md , Sanchit Misra , Ramanarayan Mohanty

Hypergraph Partitioning on GPU with Distinct Incident Hyperedges and Size Constraints

Hypergraph partitioning is a recurring NP-hard problem in engineering; its efficient solution at scale hinges on parallelism. This work proposes a GPU-centric algorithm for multi-level hypergraph partitioning aimed at a specific set of…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-21 Marco Ronzani , Cristina Silvano

Compilation Techniques for Graph Algorithms on GPUs

The performance of graph programs depends highly on the algorithm, the size and structure of the input graphs, as well as the features of the underlying hardware. No single set of optimizations or one hardware platform works well across all…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-01-11 Ajay Brahmakshatriya , Yunming Zhang , Changwan Hong , Shoaib Kamil , Julian Shun , Saman Amarasinghe