Related papers: Trie Compression for GPU Accelerated Multi-Pattern…
Pattern Matching is a computationally intensive task used in many research fields and real world applications. Due to the ever-growing volume of data to be processed, and increasing link speeds, the number of patterns to be matched has…
Graphs can be used to represent a wide variety of data belonging to different domains. Graphs can capture the relationship among data in an efficient way, and have been widely used. In recent times, with the advent of Big Data, there has…
String matching is an important part in today's computer applications and Aho-Corasick algorithm is one of the main string matching algorithms used to accomplish this. This paper discusses that when can the GPUs be used for string matching…
Multiple matching algorithms are used to locate the occurrences of patterns from a finite pattern set in a large input string. Aho-Corasick and Wu-Manber, two of the most well known algorithms for multiple matching require an increased…
Massively multicore processors, such as Graphics Processing Units (GPUs), provide, at a comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost of computation, as any…
The torrential influx of floating-point data from domains like IoT and HPC necessitates high-performance lossless compression to mitigate storage costs while preserving absolute data fidelity. Leveraging GPU parallelism for this task…
Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…
String matching algorithms are among one of the most widely used algorithms in computer science. Traditional string matching algorithms efficiency of underlaying string matching algorithm will greatly increase the efficiency of any…
Modeling data sharing in GPU programs is a challenging task because of the massive parallelism and complex data sharing patterns provided by GPU architectures. Better GPU caching efficiency can be achieved through careful task scheduling…
Today's exponentially increasing data volumes and the high cost of storage make compression essential for the Big Data industry. Although research has concentrated on efficient compression, fast decompression is critical for analytics…
With endless amounts of data and very limited bandwidth, fast data compression is one solution for the growing datasharing problem. Compression helps lower transfer times and save memory, but if the compression takes too long, this no…
Reduction operations are extensively employed in many computational problems. A reduction consists of, given a finite set of numeric elements, combining into a single value all elements in that set, using for this a combiner function. A…
Subgraph matching is a core operation in graph analytics, supporting a broad spectrum of applications from social network analysis to bioinformatics. Recent GPU-based approaches accelerate subgraph matching by leveraging parallelism but…
Priority queue, often implemented as a heap, is an abstract data type that has been used in many well-known applications like Dijkstra's shortest path algorithm, Prim's minimum spanning tree, Huffman encoding, and the branch-and-bound…
Data compression and decompression have become vital components of big-data applications to manage the exponential growth in the amount of data collected and stored. Furthermore, big-data applications have increasingly adopted GPUs due to…
Bloom filters are a fundamental data structure for approximate membership queries, with applications ranging from data analytics to databases and genomics. Several variants have been proposed to accommodate parallel architectures. GPUs,…
Process mapping asks to assign vertices of a task graph to processing elements of a supercomputer such that the computational workload is balanced while the communication cost is minimized. Motivated by the recent success of GPU-based graph…
Generalized sparse matrix-matrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speedups for an…
Large-scale Transformer models are known for their exceptional performance in a range of tasks, but training them can be difficult due to the requirement for communication-intensive model parallelism. One way to improve training speed is to…
We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion.…