Related papers: A New Data Layout For Set Intersection on GPUs

GraphMatch: Subgraph Query Processing on FPGAs

Efficiently finding subgraph embeddings in large graphs is crucial for many application areas like biology and social network analysis. Set intersections are the predominant and most challenging aspect of current join-based subgraph query…

Databases · Computer Science 2024-02-28 Jonas Dann , Tobias Götz , Daniel Ritter , Jana Giceva , Holger Fröning

Efficient Strategies for Graph Pattern Mining Algorithms on GPUs

Graph Pattern Mining (GPM) is an important, rapidly evolving, and computation demanding area. GPM computation relies on subgraph enumeration, which consists in extracting subgraphs that match a given property from an input graph. Graphics…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-12-12 Samuel Ferraz , Vinicius Dias , Carlos H. C. Teixeira , George Teodoro , Wagner Meira

Manycore processing of repeated range queries over massive moving objects observations

The ability to timely process significant amounts of continuously updated spatial data is mandatory for an increasing number of applications. Parallelism enables such applications to face this data-intensive challenge and allows the devised…

Databases · Computer Science 2014-11-13 Francesco Lettich , Salvatore Orlando , Claudio Silvestri , Christian S. Jensen

Incidence Constraints in Hypergraph Partitioning on GPU

Hypergraph partitioning is a pervasive NP-hard problem, and accelerating its computation on GPU can both slice time-to-solution and raise quality of results. In this work, we implement a multi-level hypergraph partitioning algorithm on GPU…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-17 Marco Ronzani , Cristina Silvano

Parallel algorithms for mining of frequent itemsets

In the recent decade companies started collecting of large amount of data. Without a proper analyse, the data are usually useless. The field of analysing the data is called data mining. Unfortunately, the amount of data is quite large: the…

Databases · Computer Science 2021-08-12 Robert Kessl

GPU-Accelerated Algorithms for Process Mapping

Process mapping asks to assign vertices of a task graph to processing elements of a supercomputer such that the computational workload is balanced while the communication cost is minimized. Motivated by the recent success of GPU-based graph…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-16 Petr Samoldekin , Christian Schulz , Henning Woydt

A Framework for General Sparse Matrix-Matrix Multiplication on GPUs and Heterogeneous Processors

General sparse matrix-matrix multiplication (SpGEMM) is a fundamental building block for numerous applications such as algebraic multigrid method (AMG), breadth first search and shortest path problem. Compared to other sparse BLAS routines,…

Mathematical Software · Computer Science 2015-09-15 Weifeng Liu , Brian Vinter

gMatch: Fine-Grained and Hardware-Efficient Subgraph Matching on GPUs

Subgraph matching is a core operation in graph analytics, supporting a broad spectrum of applications from social network analysis to bioinformatics. Recent GPU-based approaches accelerate subgraph matching by leveraging parallelism but…

Databases · Computer Science 2026-04-14 Weitian Chen , Shixuan Sun , Cheng Chen , Yongmin Hu , Yingqian Hu , Minyi Guo

Design Principles for Sparse Matrix Multiplication on the GPU

We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion.…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-13 Carl Yang , Aydin Buluc , John D. Owens

A Graph-based Model for GPU Caching Problems

Modeling data sharing in GPU programs is a challenging task because of the massive parallelism and complex data sharing patterns provided by GPU architectures. Better GPU caching efficiency can be achieved through careful task scheduling…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-04 Lingda Li , Ari B. Hayes , Stephen A. Hackler , Eddy Z. Zhang , Mario Szegedy , Shuaiwen Leon Song

Thread Batching for High-performance Energy-efficient GPU Memory Design

Massive multi-threading in GPU imposes tremendous pressure on memory subsystems. Due to rapid growth in thread-level parallelism of GPU and slowly improved peak memory bandwidth, the memory becomes a bottleneck of GPU's performance and…

Hardware Architecture · Computer Science 2019-06-17 Bing Li , Mengjie Mao , Xiaoxiao Liu , Tao Liu , Zihao Liu , Wujie Wen , Yiran Chen , Hai , Li

Batched Sparse Matrix Multiplication for Accelerating Graph Convolutional Networks

Graph Convolutional Networks (GCNs) are recently getting much attention in bioinformatics and chemoinformatics as a state-of-the-art machine learning approach with high accuracy. GCNs process convolutional operations along with graph…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-03-28 Yusuke Nagasaka , Akira Nukada , Ryosuke Kojima , Satoshi Matsuoka

GraphVine: A Data Structure to Optimize Dynamic Graph Processing on GPUs

Graph processing on GPUs is gaining momentum due to the high throughputs observed compared to traditional CPUs, attributed to the vast number of processing cores on GPUs that can exploit parallelism in graph analytics. This paper discusses…

Data Structures and Algorithms · Computer Science 2023-07-27 Rohith Krishnan S , Venkata Kalyan Tavva , Rupesh Nasre

Exploring the Limits of GPUs With Parallel Graph Algorithms

In this paper, we explore the limits of graphics processors (GPUs) for general purpose parallel computing by studying problems that require highly irregular data access patterns: parallel graph algorithms for list ranking and connected…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-02-25 Frank Dehne , Kumanan Yogaratnam

AutoGMap: Learning to Map Large-scale Sparse Graphs on Memristive Crossbars

The sparse representation of graphs has shown great potential for accelerating the computation of graph applications (e.g., Social Networks, Knowledge Graphs) on traditional computing architectures (CPU, GPU, or TPU). But the exploration of…

Machine Learning · Computer Science 2024-10-28 Bo Lyu , Shengbo Wang , Shiping Wen , Kaibo Shi , Yin Yang , Lingfang Zeng , Tingwen Huang

GPUSCAN$^{++}$:Efficient Structural Graph Clustering on GPUs

Structural clustering is one of the most popular graph clustering methods, which has achieved great performance improvement by utilizing GPUs. Even though, the state-of-the-art GPU-based structural clustering algorithm, GPUSCAN, still…

Databases · Computer Science 2023-12-01 Long Yuan , Zeyu Zhou , Xuemin Lin , Zi Chen , Xiang Zhao , Fan Zhang

Techniques for Shared Resource Management in Systems with Throughput Processors

The continued growth of the computational capability of throughput processors has made throughput processors the platform of choice for a wide variety of high performance computing applications. Graphics Processing Units (GPUs) are a prime…

Hardware Architecture · Computer Science 2018-05-01 Rachata Ausavarungnirun

Hypergraph Partitioning on GPU with Distinct Incident Hyperedges and Size Constraints

Hypergraph partitioning is a recurring NP-hard problem in engineering; its efficient solution at scale hinges on parallelism. This work proposes a GPU-centric algorithm for multi-level hypergraph partitioning aimed at a specific set of…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-21 Marco Ronzani , Cristina Silvano

A Novel Process Mapping Strategy in Clustered Environments

Nowadays the number of available processing cores within computing nodes which are used in recent clustered environments, are growing up with a rapid rate. Despite this trend, the number of available network interfaces in such computing…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-07-13 Mohsen Soryani , Morteza Analoui , Ghobad Zarrinchian

TurboMap: GPU-Accelerated Local Mapping for Visual SLAM

In real-time Visual SLAM systems, local mapping must operate under strict latency constraints, as delays degrade map quality and increase the risk of tracking failure. GPU parallelization offers a promising way to reduce latency. However,…

Robotics · Computer Science 2026-03-19 Parsa Hosseininejad , Kimia Khabiri , Shishir Gopinath , Soudabeh Mohammadhashemi , Karthik Dantu , Steven Y. Ko