Related papers: Compilation Techniques for Graph Algorithms on GPU…

GraphIt: A High-Performance DSL for Graph Analytics

The performance bottlenecks of graph applications depend not only on the algorithm and the underlying hardware, but also on the size and structure of the input graph. Programmers must try different combinations of a large set of techniques…

Programming Languages · Computer Science 2018-10-24 Yunming Zhang , Mengjiao Yang , Riyadh Baghdadi , Shoaib Kamil , Julian Shun , Saman Amarasinghe

Optimizing Ordered Graph Algorithms with GraphIt

Many graph problems can be solved using ordered parallel graph algorithms that achieve significant speedup over their unordered counterparts by reducing redundant work. This paper introduces a new priority-based extension to GraphIt, a…

Programming Languages · Computer Science 2020-01-28 Yunming Zhang , Ajay Brahmakshatriya , Xinyi Chen , Laxman Dhulipala , Shoaib Kamil , Saman Amarasinghe , Julian Shun

Graphite: A GPU-Accelerated Mixed-Precision Graph Optimization Framework

We present Graphite, a GPU-accelerated nonlinear least squares graph optimization framework. It provides a CUDA C++ interface to enable the sharing of code between a real-time application, such as a SLAM system, and its optimization tasks.…

Robotics · Computer Science 2026-03-17 Shishir Gopinath , Karthik Dantu , Steven Y. Ko

Exploring the Design Space of Static and Incremental Graph Connectivity Algorithms on GPUs

Connected components and spanning forest are fundamental graph algorithms due to their use in many important applications, such as graph clustering and image segmentation. GPUs are an ideal platform for graph algorithms due to their high…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-08-28 Changwan Hong , Laxman Dhulipala , Julian Shun

GraphMat: High performance graph analytics made productive

Given the growing importance of large-scale graph analytics, there is a need to improve the performance of graph analysis frameworks without compromising on productivity. GraphMat is our solution to bridge this gap between a user-friendly…

Performance · Computer Science 2015-03-26 Narayanan Sundaram , Nadathur Rajagopalan Satish , Md Mostofa Ali Patwary , Subramanya R Dulloor , Satya Gautam Vadlamudi , Dipankar Das , Pradeep Dubey

GraphCage: Cache Aware Graph Processing on GPUs

Efficient Graph processing is challenging because of the irregularity of graph algorithms. Using GPUs to accelerate irregular graph algorithms is even more difficult to be efficient, since GPU's highly structured SIMT architecture is not a…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-04-05 Xuhao Chen

GraphVine: A Data Structure to Optimize Dynamic Graph Processing on GPUs

Graph processing on GPUs is gaining momentum due to the high throughputs observed compared to traditional CPUs, attributed to the vast number of processing cores on GPUs that can exploit parallelism in graph analytics. This paper discusses…

Data Structures and Algorithms · Computer Science 2023-07-27 Rohith Krishnan S , Venkata Kalyan Tavva , Rupesh Nasre

FastGL: A GPU-Efficient Framework for Accelerating Sampling-Based GNN Training at Large Scale

Graph Neural Networks (GNNs) have shown great superiority on non-Euclidean graph data, achieving ground-breaking performance on various graph-related tasks. As a practical solution to train GNN on large graphs with billions of nodes and…

Machine Learning · Computer Science 2024-09-24 Zeyu Zhu , Peisong Wang , Qinghao Hu , Gang Li , Xiaoyao Liang , Jian Cheng

A Graph-based Model for GPU Caching Problems

Modeling data sharing in GPU programs is a challenging task because of the massive parallelism and complex data sharing patterns provided by GPU architectures. Better GPU caching efficiency can be achieved through careful task scheduling…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-04 Lingda Li , Ari B. Hayes , Stephen A. Hackler , Eddy Z. Zhang , Mario Szegedy , Shuaiwen Leon Song

GPU-Accelerated Algorithms for Process Mapping

Process mapping asks to assign vertices of a task graph to processing elements of a supercomputer such that the computational workload is balanced while the communication cost is minimized. Motivated by the recent success of GPU-based graph…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-16 Petr Samoldekin , Christian Schulz , Henning Woydt

gMatch: Fine-Grained and Hardware-Efficient Subgraph Matching on GPUs

Subgraph matching is a core operation in graph analytics, supporting a broad spectrum of applications from social network analysis to bioinformatics. Recent GPU-based approaches accelerate subgraph matching by leveraging parallelism but…

Databases · Computer Science 2026-04-14 Weitian Chen , Shixuan Sun , Cheng Chen , Yongmin Hu , Yingqian Hu , Minyi Guo

Boosting Performance of Iterative Applications on GPUs: Kernel Batching with CUDA Graphs

Graphics Processing Units (GPUs) have become the standard in accelerating scientific applications on heterogeneous systems. However, as GPUs are getting faster, one potential performance bottleneck with GPU-accelerated applications is the…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-01 Jonah Ekelund , Stefano Markidis , Ivy Peng

End-to-end Mapping in Heterogeneous Systems Using Graph Representation Learning

To enable heterogeneous computing systems with autonomous programming and optimization capabilities, we propose a unified, end-to-end, programmable graph representation learning (PGL) framework that is capable of mining the complexity of…

Machine Learning · Computer Science 2022-04-27 Yao Xiao , Guixiang Ma , Nesreen K. Ahmed , Mihai Capota , Theodore Willke , Shahin Nazarian , Paul Bogdan

OpenGraphGym-MG: Using Reinforcement Learning to Solve Large Graph Optimization Problems on MultiGPU Systems

Large scale graph optimization problems arise in many fields. This paper presents an extensible, high performance framework (named OpenGraphGym-MG) that uses deep reinforcement learning and graph embedding to solve large graph optimization…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-25 Weijian Zheng , Dali Wang , Fengguang Song

A Modular Graph-Native Query Optimization Framework

Complex Graph Patterns (CGPs), which combine pattern matching with relational operations, are widely used in real-world applications. Existing systems rely on monolithic architectures for CGPs, which restrict their ability to integrate…

Databases · Computer Science 2024-12-13 Bingqing Lyu , Xiaoli Zhou , Longbin Lai , Yufan Yang , Yunkai Lou , Wenyuan Yu , Jingren Zhou

Gunrock: GPU Graph Analytics

For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library.…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-01-06 Yangzihao Wang , Yuechao Pan , Andrew Davidson , Yuduo Wu , Carl Yang , Leyuan Wang , Muhammad Osama , Chenshan Yuan , Weitang Liu , Andy T. Riffel , John D. Owens

Systolic Computing on GPUs for Productive Performance

We propose a language and compiler to productively build high-performance {\it software systolic arrays} that run on GPUs. Based on a rigorous mathematical foundation (uniform recurrence equations and space-time transform), our language has…

Programming Languages · Computer Science 2020-11-02 Hongbo Rong , Xiaochen Hao , Yun Liang , Lidong Xu , Hong H Jiang , Pradeep Dubey

GraphVite: A High-Performance CPU-GPU Hybrid System for Node Embedding

Learning continuous representations of nodes is attracting growing interest in both academia and industry recently, due to their simplicity and effectiveness in a variety of applications. Most of existing node embedding algorithms and…

Machine Learning · Computer Science 2019-03-05 Zhaocheng Zhu , Shizhen Xu , Meng Qu , Jian Tang

ConnectIt: A Framework for Static and Incremental Parallel Graph Connectivity Algorithms

Connected components is a fundamental kernel in graph applications. The fastest existing parallel multicore algorithms for connectivity are based on some form of edge sampling and/or linking and compressing trees. However, many combinations…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-08-25 Laxman Dhulipala , Changwan Hong , Julian Shun

Efficient Automatic Scheduling of Imaging and Vision Pipelines for the GPU

We present a new algorithm to quickly generate high-performance GPU implementations of complex imaging and vision pipelines, directly from high-level Halide algorithm code. It is fully automatic, requiring no schedule templates or…

Programming Languages · Computer Science 2023-08-29 Luke Anderson , Andrew Adams , Karima Ma , Tzu-Mao Li , Tian Jin , Jonathan Ragan-Kelley