Related papers: A New Data Layout For Set Intersection on GPUs
Efficiently finding subgraph embeddings in large graphs is crucial for many application areas like biology and social network analysis. Set intersections are the predominant and most challenging aspect of current join-based subgraph query…
Graph Pattern Mining (GPM) is an important, rapidly evolving, and computation demanding area. GPM computation relies on subgraph enumeration, which consists in extracting subgraphs that match a given property from an input graph. Graphics…
The ability to timely process significant amounts of continuously updated spatial data is mandatory for an increasing number of applications. Parallelism enables such applications to face this data-intensive challenge and allows the devised…
Hypergraph partitioning is a pervasive NP-hard problem, and accelerating its computation on GPU can both slice time-to-solution and raise quality of results. In this work, we implement a multi-level hypergraph partitioning algorithm on GPU…
In the recent decade companies started collecting of large amount of data. Without a proper analyse, the data are usually useless. The field of analysing the data is called data mining. Unfortunately, the amount of data is quite large: the…
Process mapping asks to assign vertices of a task graph to processing elements of a supercomputer such that the computational workload is balanced while the communication cost is minimized. Motivated by the recent success of GPU-based graph…
General sparse matrix-matrix multiplication (SpGEMM) is a fundamental building block for numerous applications such as algebraic multigrid method (AMG), breadth first search and shortest path problem. Compared to other sparse BLAS routines,…
Subgraph matching is a core operation in graph analytics, supporting a broad spectrum of applications from social network analysis to bioinformatics. Recent GPU-based approaches accelerate subgraph matching by leveraging parallelism but…
We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion.…
Modeling data sharing in GPU programs is a challenging task because of the massive parallelism and complex data sharing patterns provided by GPU architectures. Better GPU caching efficiency can be achieved through careful task scheduling…
Massive multi-threading in GPU imposes tremendous pressure on memory subsystems. Due to rapid growth in thread-level parallelism of GPU and slowly improved peak memory bandwidth, the memory becomes a bottleneck of GPU's performance and…
Graph Convolutional Networks (GCNs) are recently getting much attention in bioinformatics and chemoinformatics as a state-of-the-art machine learning approach with high accuracy. GCNs process convolutional operations along with graph…
Graph processing on GPUs is gaining momentum due to the high throughputs observed compared to traditional CPUs, attributed to the vast number of processing cores on GPUs that can exploit parallelism in graph analytics. This paper discusses…
In this paper, we explore the limits of graphics processors (GPUs) for general purpose parallel computing by studying problems that require highly irregular data access patterns: parallel graph algorithms for list ranking and connected…
The sparse representation of graphs has shown great potential for accelerating the computation of graph applications (e.g., Social Networks, Knowledge Graphs) on traditional computing architectures (CPU, GPU, or TPU). But the exploration of…
Structural clustering is one of the most popular graph clustering methods, which has achieved great performance improvement by utilizing GPUs. Even though, the state-of-the-art GPU-based structural clustering algorithm, GPUSCAN, still…
The continued growth of the computational capability of throughput processors has made throughput processors the platform of choice for a wide variety of high performance computing applications. Graphics Processing Units (GPUs) are a prime…
Hypergraph partitioning is a recurring NP-hard problem in engineering; its efficient solution at scale hinges on parallelism. This work proposes a GPU-centric algorithm for multi-level hypergraph partitioning aimed at a specific set of…
Nowadays the number of available processing cores within computing nodes which are used in recent clustered environments, are growing up with a rapid rate. Despite this trend, the number of available network interfaces in such computing…
In real-time Visual SLAM systems, local mapping must operate under strict latency constraints, as delays degrade map quality and increase the risk of tracking failure. GPU parallelization offers a promising way to reduce latency. However,…