Related papers: GGArray: A Dynamically Growable GPU Array

GraphVine: A Data Structure to Optimize Dynamic Graph Processing on GPUs

Graph processing on GPUs is gaining momentum due to the high throughputs observed compared to traditional CPUs, attributed to the vast number of processing cores on GPUs that can exploit parallelism in graph analytics. This paper discusses…

Data Structures and Algorithms · Computer Science 2023-07-27 Rohith Krishnan S , Venkata Kalyan Tavva , Rupesh Nasre

G10: Enabling An Efficient Unified GPU Memory and Storage Architecture with Smart Tensor Migrations

To break the GPU memory wall for scaling deep learning workloads, a variety of architecture and system techniques have been proposed recently. Their typical approaches include memory extension with flash memory and direct storage access.…

Hardware Architecture · Computer Science 2023-10-17 Haoyang Zhang , Yirui Eric Zhou , Yuqi Xue , Yiqi Liu , Jian Huang

FlowWalker: A Memory-efficient and High-performance GPU-based Dynamic Graph Random Walk Framework

Dynamic graph random walk (DGRW) emerges as a practical tool for capturing structural relations within a graph. Effectively executing DGRW on GPU presents certain challenges. First, existing sampling methods demand a pre-processing buffer,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-29 Junyi Mei , Shixuan Sun , Chao Li , Cheng Xu , Cheng Chen , Yibo Liu , Jing Wang , Cheng Zhao , Xiaofeng Hou , Minyi Guo , Bingsheng He , Xiaoliang Cong

GPUSCAN$^{++}$:Efficient Structural Graph Clustering on GPUs

Structural clustering is one of the most popular graph clustering methods, which has achieved great performance improvement by utilizing GPUs. Even though, the state-of-the-art GPU-based structural clustering algorithm, GPUSCAN, still…

Databases · Computer Science 2023-12-01 Long Yuan , Zeyu Zhou , Xuemin Lin , Zi Chen , Xiang Zhao , Fan Zhang

Addressing memory bandwidth scalability in vector processors for streaming applications

As the size of artificial intelligence and machine learning (AI/ML) models and datasets grows, the memory bandwidth becomes a critical bottleneck. The paper presents a novel extended memory hierarchy that addresses some major memory…

Hardware Architecture · Computer Science 2025-05-20 Jordi Altayo , Paul Delestrac , David Novo , Simey Yang , Debjyoti Bhattacharjee , Francky Catthoor

Beyond the GPU: The Strategic Role of FPGAs in the Next Wave of AI

AI acceleration has been dominated by GPUs, but the growing need for lower latency, energy efficiency, and fine-grained hardware control exposes the limits of fixed architectures. In this context, Field-Programmable Gate Arrays (FPGAs)…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-18 Arturo Urías Jiménez

GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs

As the emerging trend of graph-based deep learning, Graph Neural Networks (GNNs) excel for their capability to generate high-quality node feature vectors (embeddings). However, the existing one-size-fits-all GNN implementations are…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-22 Yuke Wang , Boyuan Feng , Gushu Li , Shuangchen Li , Lei Deng , Yuan Xie , Yufei Ding

GRay: a Massively Parallel GPU-Based Code for Ray Tracing in Relativistic Spacetimes

We introduce GRay, a massively parallel integrator designed to trace the trajectories of billions of photons in a curved spacetime. This GPU-based integrator employs the stream processing paradigm, is implemented in CUDA C/C++, and runs on…

Instrumentation and Methods for Astrophysics · Physics 2015-06-15 Chi-kwan Chan , Dimitrios Psaltis , Feryal Ozel

DGAP: Efficient Dynamic Graph Analysis on Persistent Memory

Dynamic graphs, featuring continuously updated vertices and edges, have grown in importance for numerous real-world applications. To accommodate this, graph frameworks, particularly their internal data structures, must support both…

Data Structures and Algorithms · Computer Science 2024-03-06 Abdullah Al Raqibul Islam , Dong Dai

LightRW: FPGA Accelerated Graph Dynamic Random Walks

Graph dynamic random walks (GDRWs) have recently emerged as a powerful paradigm for graph analytics and learning applications, including graph embedding and graph neural networks. Despite the fact that many existing studies optimize the…

Hardware Architecture · Computer Science 2023-04-24 Hongshi Tan , Xinyu Chen , Yao Chen , Bingsheng He , Weng-Fai Wong

GPU-Accelerated Batch-Dynamic Subgraph Matching

Subgraph matching has garnered increasing attention for its diverse real-world applications. Given the dynamic nature of real-world graphs, addressing evolving scenarios without incurring prohibitive overheads has been a focus of research.…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-31 Linshan Qiu , Lu Chen , Hailiang Jie , Xiangyu Ke , Yunjun Gao , Yang Liu , Zetao Zhang

Technical Report: Accelerating Dynamic Graph Analytics on GPUs

As graph analytics often involves compute-intensive operations, GPUs have been extensively used to accelerate the processing. However, in many applications such as social networks, cyber security, and fraud detection, their representative…

Data Structures and Algorithms · Computer Science 2018-06-28 Mo Sha , Yuchen Li , Bingsheng He , Kian-Lee Tan

Easy Acceleration with Distributed Arrays

High level programming languages and GPU accelerators are powerful enablers for a wide range of applications. Achieving scalable vertical (within a compute node), horizontal (across compute nodes), and temporal (over different generations…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-21 Jeremy Kepner , Chansup Byun , LaToya Anderson , William Arcand , David Bestor , William Bergeron , Alex Bonn , Daniel Burrill , Vijay Gadepally , Ryan Haney , Michael Houle , Matthew Hubbell , Hayden Jananthan , Michael Jones , Piotr Luszczek , Lauren Milechin , Guillermo Morales , Julie Mullen , Andrew Prout , Albert Reuther , Antonio Rosa , Charles Yee , Peter Michaleas

gMatch: Fine-Grained and Hardware-Efficient Subgraph Matching on GPUs

Subgraph matching is a core operation in graph analytics, supporting a broad spectrum of applications from social network analysis to bioinformatics. Recent GPU-based approaches accelerate subgraph matching by leveraging parallelism but…

Databases · Computer Science 2026-04-14 Weitian Chen , Shixuan Sun , Cheng Chen , Yongmin Hu , Yingqian Hu , Minyi Guo

A Scalable FPGA Architecture With Adaptive Memory Utilization for GEMM-Based Operations

Deep neural network (DNN) inference relies increasingly on specialized hardware for high computational efficiency. This work introduces a field-programmable gate array (FPGA)-based dynamically configurable accelerator featuring systolic…

Hardware Architecture · Computer Science 2025-10-10 Anastasios Petropoulos , Theodore Antonakopoulos

Accelerating Algorithms using a Dataflow Graph in a Reconfigurable System

In this paper, the acceleration of algorithms using a design of a field programmable gate array (FPGA) as a prototype of a static dataflow architecture is discussed. The static dataflow architecture using operators interconnected by…

Hardware Architecture · Computer Science 2015-03-13 Jorge Luiz e Silva , Joelmir Jose Lopes , Bruno de Abreu Silva , Antonio Carlos Fernandes da Silva

GraphScale: Scalable Bandwidth-Efficient Graph Processing on FPGAs

Recent advances in graph processing on FPGAs promise to alleviate performance bottlenecks with irregular memory access patterns. Such bottlenecks challenge performance for a growing number of important application areas like machine…

Hardware Architecture · Computer Science 2022-06-20 Jonas Dann , Daniel Ritter , Holger Fröning

HyGain: High Performance, Energy-Efficient Hybrid Gain Cell based Cache Hierarchy

In this paper, we propose a 'full-stack' solution to designing high capacity and low latency on-chip cache hierarchies by starting at the circuit level of the hardware design stack. First, we propose a novel Gain Cell (GC) design using…

Hardware Architecture · Computer Science 2021-10-07 Sarabjeet Singh , Neelam Surana , Pranjali Jain , Joycee Mekie , Manu Awasthi

GPUVM: GPU-driven Unified Virtual Memory

Graphics Processing Units (GPUs) leverage massive parallelism and large memory bandwidth to support high-performance computing applications, such as multimedia rendering, crypto-mining, deep learning, and natural language processing. These…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-11 Nurlan Nazaraliyev , Elaheh Sadredini , Nael Abu-Ghazaleh

GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU

High-performance implementations of graph algorithms are challenging to implement on new parallel hardware such as GPUs because of three challenges: (1) the difficulty of coming up with graph building blocks, (2) load imbalance on parallel…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-16 Carl Yang , Aydin Buluc , John D. Owens