Related papers: GX-Plug: a Middleware for Plugging Accelerators to…

GraphX: Unifying Data-Parallel and Graph-Parallel Analytics

From social networks to language modeling, the growing scale and importance of graph data has driven the development of numerous new graph-parallel systems (e.g., Pregel, GraphLab). By restricting the computation that can be expressed and…

Databases · Computer Science 2014-02-12 Reynold S. Xin , Daniel Crankshaw , Ankur Dave , Joseph E. Gonzalez , Michael J. Franklin , Ion Stoica

A Survey on Graph Processing Accelerators: Challenges and Opportunities

Graph is a well known data structure to represent the associated relationships in a variety of applications, e.g., data science and machine learning. Despite a wealth of existing efforts on developing graph processing systems for improving…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-28 Chuangyi Gui , Long Zheng , Bingsheng He , Cheng Liu , Xinyu Chen , Xiaofei Liao , Hai Jin

Scalable and Adaptive Parallel Training of Graph Transformer on Large Graphs

Graph foundation models have demonstrated remarkable adaptability across diverse downstream tasks through large-scale pretraining on graphs. However, existing implementations of the backbone model, graph transformers, are typically limited…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-21 Jun-Liang Lin , Kamesh Madduri , Mahmut Taylan Kandemir

High-throughput Execution of Hierarchical Analysis Pipelines on Hybrid Cluster Platforms

We propose, implement, and experimentally evaluate a runtime middleware to support high-throughput execution on hybrid cluster machines of large-scale analysis applications. A hybrid cluster machine consists of computation nodes which have…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-09-18 George Teodoro , Tony Pan , Tahsin M. Kurc , Jun Kong , Lee A. D. Cooper , Joel H. Saltz

GraVF-M: Graph Processing System Generation for Multi-FPGA Platforms

Due to the irregular nature of connections in most graph datasets, partitioning graph analysis algorithms across multiple computational nodes that do not share a common memory inevitably leads to large amounts of interconnect traffic.…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-22 Nina Engelhardt , Hayden K. -H. So

Scalable Graph Embedding LearningOn A Single GPU

Graph embedding techniques have attracted growing interest since they convert the graph data into continuous and low-dimensional space. Effective graph analytic provides users a deeper understanding of what is behind the data and thus can…

Machine Learning · Computer Science 2022-01-21 Azita Nouri , Philip E. Davis , Pradeep Subedi , Manish Parashar

A GPU-Accelerated Distributed Algorithm for Optimal Power Flow in Distribution Systems

We propose a GPU-accelerated distributed optimization algorithm for controlling multi-phase optimal power flow in active distribution systems with dynamically changing topologies. To handle varying network configurations and enable…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-15 Minseok Ryu , Geunyeong Byeon , Kibaek Kim

GraphMat: High performance graph analytics made productive

Given the growing importance of large-scale graph analytics, there is a need to improve the performance of graph analysis frameworks without compromising on productivity. GraphMat is our solution to bridge this gap between a user-friendly…

Performance · Computer Science 2015-03-26 Narayanan Sundaram , Nadathur Rajagopalan Satish , Md Mostofa Ali Patwary , Subramanya R Dulloor , Satya Gautam Vadlamudi , Dipankar Das , Pradeep Dubey

Accurate, Efficient and Scalable Graph Embedding

The Graph Convolutional Network (GCN) model and its variants are powerful graph embedding tools for facilitating classification and clustering on graphs. However, a major challenge is to reduce the complexity of layered GCNs and make them…

Machine Learning · Computer Science 2020-08-06 Hanqing Zeng , Hongkuan Zhou , Ajitesh Srivastava , Rajgopal Kannan , Viktor Prasanna

gMatch: Fine-Grained and Hardware-Efficient Subgraph Matching on GPUs

Subgraph matching is a core operation in graph analytics, supporting a broad spectrum of applications from social network analysis to bioinformatics. Recent GPU-based approaches accelerate subgraph matching by leveraging parallelism but…

Databases · Computer Science 2026-04-14 Weitian Chen , Shixuan Sun , Cheng Chen , Yongmin Hu , Yingqian Hu , Minyi Guo

Scaling Up Large-Scale Graph Processing for GPU-Accelerated Heterogeneous Systems

Not only with the large host memory for supporting large scale graph processing, GPU-accelerated heterogeneous architecture can also provide a great potential for high-performance computing. However, few existing heterogeneous systems can…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-05 Xianliang Li

SIMD-X: Programming and Processing of Graph Algorithms on GPUs

With high computation power and memory bandwidth, graphics processing units (GPUs) lend themselves to accelerate data-intensive analytics, especially when such applications fit the single instruction multiple data (SIMD) model. However,…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-12-12 Hang Liu , H. Howie Huang

Pregelix: Big(ger) Graph Analytics on A Dataflow Engine

There is a growing need for distributed graph processing systems that are capable of gracefully scaling to very large graph datasets. Unfortunately, this challenge has not been easily met due to the intense memory pressure imposed by…

Databases · Computer Science 2014-07-03 Yingyi Bu , Vinayak Borkar , Jianfeng Jia , Michael J. Carey , Tyson Condie

Plexus: Taming Billion-edge Graphs with 3D Parallel Full-graph GNN Training

Graph neural networks (GNNs) leverage the connectivity and structure of real-world graphs to learn intricate properties and relationships between nodes. Many real-world graphs exceed the memory capacity of a GPU due to their sheer size, and…

Machine Learning · Computer Science 2025-10-30 Aditya K. Ranjan , Siddharth Singh , Cunyang Wei , Abhinav Bhatele

Overcoming Latency-bound Limitations of Distributed Graph Algorithms using the HPX Runtime System

Graph processing at scale presents many challenges, including the irregular structure of graphs, the latency-bound nature of graph algorithms, and the overhead associated with distributed execution. While existing frameworks such as Spark…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-06 Karame Mohammadiporshokooh , Panagiotis Syskakis , Andrew Lumsdaine , Hartmut Kaiser

An Efficient Dispatcher for Large Scale GraphProcessing on OpenCL-based FPGAs

High parallel framework has been proved to be very suitable for graph processing. There are various work to optimize the implementation in FPGAs, a pipeline parallel device. The key to make use of the parallel performance of FPGAs is to…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-02 Chengbo Yang

Lighter-X: An Efficient and Plug-and-play Strategy for Graph-based Recommendation through Decoupled Propagation

Graph Neural Networks (GNNs) have demonstrated remarkable effectiveness in recommendation systems. However, conventional graph-based recommenders, such as LightGCN, require maintaining embeddings of size $d$ for each node, resulting in a…

Machine Learning · Computer Science 2025-10-14 Yanping Zheng , Zhewei Wei , Frank de Hoog , Xu Chen , Hongteng Xu , Yuhang Ye , Jiadeng Huang

Accelerator Codesign as Non-Linear Optimization

We propose an optimization approach for determining both hardware and software parameters for the efficient implementation of a (family of) applications called dense stencil computations on programmable GPGPUs. We first introduce a simple,…

Hardware Architecture · Computer Science 2017-12-26 Nirmal Prajapati , Sanjay Rajopadhye , Hristo Djidjev , Nandkishore Santhi , Tobias Grosser , Rumen Andonov

Partitioning Trillion-edge Graphs in Minutes

We introduce XtraPuLP, a new distributed-memory graph partitioner designed to process trillion-edge graphs. XtraPuLP is based on the scalable label propagation community detection technique, which has been demonstrated as a viable means to…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-25 George M Slota , Sivasankaran Rajamanickam , Karen Devine , Kamesh Madduri

Advanced Programming Platform for efficient use of Data Parallel Hardware

Graphics processing units (GPU) had evolved from a specialized hardware capable to render high quality graphics in games to a commodity hardware for effective processing blocks of data in a parallel schema. This evolution is particularly…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-03-26 Luis Cabellos