English
Related papers

Related papers: GGArray: A Dynamically Growable GPU Array

200 papers

Graph processing on GPUs is gaining momentum due to the high throughputs observed compared to traditional CPUs, attributed to the vast number of processing cores on GPUs that can exploit parallelism in graph analytics. This paper discusses…

Data Structures and Algorithms · Computer Science 2023-07-27 Rohith Krishnan S , Venkata Kalyan Tavva , Rupesh Nasre

To break the GPU memory wall for scaling deep learning workloads, a variety of architecture and system techniques have been proposed recently. Their typical approaches include memory extension with flash memory and direct storage access.…

Hardware Architecture · Computer Science 2023-10-17 Haoyang Zhang , Yirui Eric Zhou , Yuqi Xue , Yiqi Liu , Jian Huang

Dynamic graph random walk (DGRW) emerges as a practical tool for capturing structural relations within a graph. Effectively executing DGRW on GPU presents certain challenges. First, existing sampling methods demand a pre-processing buffer,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-29 Junyi Mei , Shixuan Sun , Chao Li , Cheng Xu , Cheng Chen , Yibo Liu , Jing Wang , Cheng Zhao , Xiaofeng Hou , Minyi Guo , Bingsheng He , Xiaoliang Cong

Structural clustering is one of the most popular graph clustering methods, which has achieved great performance improvement by utilizing GPUs. Even though, the state-of-the-art GPU-based structural clustering algorithm, GPUSCAN, still…

Databases · Computer Science 2023-12-01 Long Yuan , Zeyu Zhou , Xuemin Lin , Zi Chen , Xiang Zhao , Fan Zhang

As the size of artificial intelligence and machine learning (AI/ML) models and datasets grows, the memory bandwidth becomes a critical bottleneck. The paper presents a novel extended memory hierarchy that addresses some major memory…

Hardware Architecture · Computer Science 2025-05-20 Jordi Altayo , Paul Delestrac , David Novo , Simey Yang , Debjyoti Bhattacharjee , Francky Catthoor

AI acceleration has been dominated by GPUs, but the growing need for lower latency, energy efficiency, and fine-grained hardware control exposes the limits of fixed architectures. In this context, Field-Programmable Gate Arrays (FPGAs)…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-18 Arturo Urías Jiménez

As the emerging trend of graph-based deep learning, Graph Neural Networks (GNNs) excel for their capability to generate high-quality node feature vectors (embeddings). However, the existing one-size-fits-all GNN implementations are…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-22 Yuke Wang , Boyuan Feng , Gushu Li , Shuangchen Li , Lei Deng , Yuan Xie , Yufei Ding

We introduce GRay, a massively parallel integrator designed to trace the trajectories of billions of photons in a curved spacetime. This GPU-based integrator employs the stream processing paradigm, is implemented in CUDA C/C++, and runs on…

Instrumentation and Methods for Astrophysics · Physics 2015-06-15 Chi-kwan Chan , Dimitrios Psaltis , Feryal Ozel

Dynamic graphs, featuring continuously updated vertices and edges, have grown in importance for numerous real-world applications. To accommodate this, graph frameworks, particularly their internal data structures, must support both…

Data Structures and Algorithms · Computer Science 2024-03-06 Abdullah Al Raqibul Islam , Dong Dai

Graph dynamic random walks (GDRWs) have recently emerged as a powerful paradigm for graph analytics and learning applications, including graph embedding and graph neural networks. Despite the fact that many existing studies optimize the…

Hardware Architecture · Computer Science 2023-04-24 Hongshi Tan , Xinyu Chen , Yao Chen , Bingsheng He , Weng-Fai Wong

Subgraph matching has garnered increasing attention for its diverse real-world applications. Given the dynamic nature of real-world graphs, addressing evolving scenarios without incurring prohibitive overheads has been a focus of research.…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-31 Linshan Qiu , Lu Chen , Hailiang Jie , Xiangyu Ke , Yunjun Gao , Yang Liu , Zetao Zhang

As graph analytics often involves compute-intensive operations, GPUs have been extensively used to accelerate the processing. However, in many applications such as social networks, cyber security, and fraud detection, their representative…

Data Structures and Algorithms · Computer Science 2018-06-28 Mo Sha , Yuchen Li , Bingsheng He , Kian-Lee Tan

High level programming languages and GPU accelerators are powerful enablers for a wide range of applications. Achieving scalable vertical (within a compute node), horizontal (across compute nodes), and temporal (over different generations…

Subgraph matching is a core operation in graph analytics, supporting a broad spectrum of applications from social network analysis to bioinformatics. Recent GPU-based approaches accelerate subgraph matching by leveraging parallelism but…

Databases · Computer Science 2026-04-14 Weitian Chen , Shixuan Sun , Cheng Chen , Yongmin Hu , Yingqian Hu , Minyi Guo

Deep neural network (DNN) inference relies increasingly on specialized hardware for high computational efficiency. This work introduces a field-programmable gate array (FPGA)-based dynamically configurable accelerator featuring systolic…

Hardware Architecture · Computer Science 2025-10-10 Anastasios Petropoulos , Theodore Antonakopoulos

In this paper, the acceleration of algorithms using a design of a field programmable gate array (FPGA) as a prototype of a static dataflow architecture is discussed. The static dataflow architecture using operators interconnected by…

Hardware Architecture · Computer Science 2015-03-13 Jorge Luiz e Silva , Joelmir Jose Lopes , Bruno de Abreu Silva , Antonio Carlos Fernandes da Silva

Recent advances in graph processing on FPGAs promise to alleviate performance bottlenecks with irregular memory access patterns. Such bottlenecks challenge performance for a growing number of important application areas like machine…

Hardware Architecture · Computer Science 2022-06-20 Jonas Dann , Daniel Ritter , Holger Fröning

In this paper, we propose a 'full-stack' solution to designing high capacity and low latency on-chip cache hierarchies by starting at the circuit level of the hardware design stack. First, we propose a novel Gain Cell (GC) design using…

Hardware Architecture · Computer Science 2021-10-07 Sarabjeet Singh , Neelam Surana , Pranjali Jain , Joycee Mekie , Manu Awasthi

Graphics Processing Units (GPUs) leverage massive parallelism and large memory bandwidth to support high-performance computing applications, such as multimedia rendering, crypto-mining, deep learning, and natural language processing. These…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-11 Nurlan Nazaraliyev , Elaheh Sadredini , Nael Abu-Ghazaleh

High-performance implementations of graph algorithms are challenging to implement on new parallel hardware such as GPUs because of three challenges: (1) the difficulty of coming up with graph building blocks, (2) load imbalance on parallel…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-16 Carl Yang , Aydin Buluc , John D. Owens
‹ Prev 1 2 3 10 Next ›