Related papers: GraphR: Accelerating Graph Processing Using ReRAM

Leveraging Recurrent Patterns in Graph Accelerators

Graph accelerators have emerged as a promising solution for processing large-scale sparse graphs, leveraging the in-situ compu-tation of ReRAM-based crossbars to maximize computational efficiency. However, existing designs suffer from…

Hardware Architecture · Computer Science 2025-12-02 Masoud Rahimi , Sébastien Le Beux

GRE: A Graph Runtime Engine for Large-Scale Distributed Graph-Parallel Applications

Large-scale distributed graph-parallel computing is challenging. On one hand, due to the irregular computation pattern and lack of locality, it is hard to express parallelism efficiently. On the other hand, due to the scale-free nature,…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-10-22 Jie Yan , Guangming Tan , Ninghui Sun

GRIP: A Graph Neural Network Accelerator Architecture

We present GRIP, a graph neural network accelerator architecture designed for low-latency inference. AcceleratingGNNs is challenging because they combine two distinct types of computation: arithmetic-intensive vertex-centric operations and…

Hardware Architecture · Computer Science 2020-07-31 Kevin Kiningham , Christopher Re , Philip Levis

RAPID-Graph: Recursive All-Pairs Shortest Paths Using Processing-in-Memory for Dynamic Programming on Graphs

All-pairs shortest paths (APSP) remains a major bottleneck for large-scale graph analytics, as data movement with cubic complexity overwhelms the bandwidth of conventional memory hierarchies. In this work, we propose RAPID-Graph to address…

Hardware Architecture · Computer Science 2026-01-29 Yanru Chen , Zheyu Li , Keming Fan , Runyang Tian , John Hsu , Weihong Xu , Minxuan Zhou , Tajana Rosing

Sage: Parallel Semi-Asymmetric Graph Algorithms for NVRAMs

Non-volatile main memory (NVRAM) technologies provide an attractive set of features for large-scale graph analytics, including byte-addressability, low idle power, and improved memory-density. NVRAM systems today have an order of magnitude…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-01 Laxman Dhulipala , Charlie McGuffey , Hongbo Kang , Yan Gu , Guy E. Blelloch , Phillip B. Gibbons , Julian Shun

A Survey on Graph Processing Accelerators: Challenges and Opportunities

Graph is a well known data structure to represent the associated relationships in a variety of applications, e.g., data science and machine learning. Despite a wealth of existing efforts on developing graph processing systems for improving…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-28 Chuangyi Gui , Long Zheng , Bingsheng He , Cheng Liu , Xinyu Chen , Xiaofei Liao , Hai Jin

HC-SpMM: Accelerating Sparse Matrix-Matrix Multiplication for Graphs with Hybrid GPU Cores

Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental operation in graph computing and analytics. However, the irregularity of real-world graphs poses significant challenges to achieving efficient SpMM operation for graph data on…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-13 Zhonggen Li , Xiangyu Ke , Yifan Zhu , Yunjun Gao , Yaofeng Tu

A complete discussion on fully reconfigurable, digital, scalable, graph and sparsity-aware near-memory accelerator for graph neural networks

Graph neural networks (GNNs) have gained significant interest for applications such as citation network analysis and drug discovery due to their ability to apply machine learning techniques on graph-structured data. GNNs typically employ a…

Hardware Architecture · Computer Science 2026-05-28 Siddhartha Raman Sundara Raman , Lizy John , Jaydeep P. Kulkarni

Efficient On-Chip Communication for Parallel Graph-Analytics on Spatial Architectures

Large-scale graph processing has drawn great attention in recent years. Most of the modern-day datacenter workloads can be represented in the form of Graph Processing such as MapReduce etc. Consequently, a lot of designs for Domain-Specific…

Hardware Architecture · Computer Science 2022-09-07 Khushal Sethi

EnGN: A High-Throughput and Energy-Efficient Accelerator for Large Graph Neural Networks

Graph neural networks (GNNs) emerge as a powerful approach to process non-euclidean data structures and have been proved powerful in various application domains such as social networks and e-commerce. While such graph data maintained in…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-06 Shengwen Liang , Ying Wang , Cheng Liu , Lei He , Huawei Li , Xiaowei Li

Fast Processing of Large Graph Applications Using Asynchronous Architecture

Graph algorithms and techniques are increasingly being used in scientific and commercial applications to express relations and explore large data sets. Although conventional or commodity computer architectures, like CPU or GPU, can compute…

Hardware Architecture · Computer Science 2017-07-03 Michel A. Kinsy , Rashmi S. Agrawal , Hien D. Nguyen

ZIPPER: Exploiting Tile- and Operator-level Parallelism for General and Scalable Graph Neural Network Acceleration

Graph neural networks (GNNs) start to gain momentum after showing significant performance improvement in a variety of domains including molecular science, recommendation, and transportation. Turning such performance improvement of GNNs into…

Hardware Architecture · Computer Science 2021-07-20 Zhihui Zhang , Jingwen Leng , Shuwen Lu , Youshan Miao , Yijia Diao , Minyi Guo , Chao Li , Yuhao Zhu

PANTHER: A Programmable Architecture for Neural Network Training Harnessing Energy-efficient ReRAM

The wide adoption of deep neural networks has been accompanied by ever-increasing energy and performance demands due to the expensive nature of training them. Numerous special-purpose architectures have been proposed to accelerate training:…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-12-30 Aayush Ankit , Izzat El Hajj , Sai Rahul Chalamalasetti , Sapan Agarwal , Matthew Marinella , Martin Foltin , John Paul Strachan , Dejan Milojicic , Wen-mei Hwu , Kaushik Roy

Novel Graph Processor Architecture, Prototype System, and Results

Graph algorithms are increasingly used in applications that exploit large databases. However, conventional processor architectures are inadequate for handling the throughput and memory requirements of graph computation. Lincoln Laboratory's…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-12-13 William S. Song , Vitaliy Gleyzer , Alexei Lomakin , Jeremy Kepner

Scaling Up Large-Scale Graph Processing for GPU-Accelerated Heterogeneous Systems

Not only with the large host memory for supporting large scale graph processing, GPU-accelerated heterogeneous architecture can also provide a great potential for high-performance computing. However, few existing heterogeneous systems can…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-05 Xianliang Li

FastGraph: Optimized GPU-Enabled Algorithms for Fast Graph Building and Message Passing

We introduce FastGraph, a novel GPU-optimized k-nearest neighbor algorithm specifically designed to accelerate graph construction in low-dimensional spaces (2-10 dimensions), critical for high-performance graph neural networks. Our method…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-14 Aarush Agarwal , Raymond He , Jan Kieseler , Matteo Cremonesi , Shah Rukh Qasim

GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU

High-performance implementations of graph algorithms are challenging to implement on new parallel hardware such as GPUs because of three challenges: (1) the difficulty of coming up with graph building blocks, (2) load imbalance on parallel…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-16 Carl Yang , Aydin Buluc , John D. Owens

AutoGMap: Learning to Map Large-scale Sparse Graphs on Memristive Crossbars

The sparse representation of graphs has shown great potential for accelerating the computation of graph applications (e.g., Social Networks, Knowledge Graphs) on traditional computing architectures (CPU, GPU, or TPU). But the exploration of…

Machine Learning · Computer Science 2024-10-28 Bo Lyu , Shengbo Wang , Shiping Wen , Kaibo Shi , Yin Yang , Lingfang Zeng , Tingwen Huang

Efficient Processing of Very Large Graphs in a Small Cluster

Inspired by the success of Google's Pregel, many systems have been developed recently for iterative computation over big graphs. These systems provide a user-friendly vertex-centric programming interface, where a programmer only needs to…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-01-22 Da Yan , Yuzhen Huang , James Cheng , Huanhuan Wu

Flip: Data-Centric Edge CGRA Accelerator

Coarse-Grained Reconfigurable Arrays (CGRA) are promising edge accelerators due to the outstanding balance in flexibility, performance, and energy efficiency. Classic CGRAs statically map compute operations onto the processing elements (PE)…

Hardware Architecture · Computer Science 2023-09-20 Dan Wu , Peng Chen , Thilini Kaushalya Bandara , Zhaoying Li , Tulika Mitra