Related papers: Towards Efficient Sparse Matrix Vector Multiplicat…
Several manufacturers have already started to commercialize near-bank Processing-In-Memory (PIM) architectures. Near-bank PIM architectures place simple cores close to DRAM banks and can yield significant performance and energy improvements…
Sparse matrix-vector multiplication (SpMV) is a crucial computing kernel with widespread applications in iterative algorithms. Over the past decades, research on SpMV optimization has made remarkable strides, giving rise to various…
Emerging machine learning (ML) models (e.g., transformers) involve memory pin bandwidth-bound matrix-vector (MV) computation in inference. By avoiding pin crossings, processing in memory (PIM) can improve performance and energy for…
Data movement between memory and processors is a major bottleneck in modern computing systems. The processing-in-memory (PIM) paradigm aims to alleviate this bottleneck by performing computation inside memory chips. Real PIM hardware (e.g.,…
This paper presents a low-overhead optimizer for the ubiquitous sparse matrix-vector multiplication (SpMV) kernel. Architectural diversity among different processors together with structural diversity among different sparse matrices lead to…
Many modern workloads such as neural network inference and graph processing are fundamentally memory-bound. For such workloads, data movement between memory and CPU cores imposes a significant overhead in terms of both latency and energy. A…
Sparse matrix-vector and matrix-matrix multiplication (SpMV and SpMM) are fundamental in both conventional (graph analytics, scientific computing) and emerging (sparse DNN, GNN) domains. Workload-balancing and parallel-reduction are…
The sparse matrix-vector (SpMV) multiplication is an important computational kernel, but it is notoriously difficult to execute efficiently. This paper investigates algorithm performance for unstructured sparse matrices, which are more…
In this paper, we propose an optimization selection methodology for the ubiquitous sparse matrix-vector multiplication (SpMV) kernel. We propose two models that attempt to identify the major performance bottleneck of the kernel for every…
We propose different implementations of the sparse matrix--dense vector multiplication (\spmv{}) for finite fields and rings $\Zb/m\Zb$. We take advantage of graphic card processors (GPU) and multi-core architectures. Our aim is to improve…
Cryptographic algorithms such as AES-128 and SHA-256 are fundamental to ensuring data security and integrity. Although these algorithms are computationally efficient, their performance is often constrained by the processor-centric…
Sparse matrix vector multiplication (SpMV) is central to numerous data-intensive applications, but requires streaming indirect memory accesses that severely degrade both processing and memory throughput in state-of-the-art architectures.…
Accelerators for sparse matrix multiplication are important components in emerging systems. In this paper, we study the main challenges of accelerating Sparse Matrix Multiplication (SpMM). For the situations that data is not stored in the…
We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion.…
Sparse matrices, more specifically SpGEMM kernels, are commonly found in a wide range of applications, spanning graph-based path-finding to machine learning algorithms (e.g., neural networks). A particular challenge in implementing SpGEMM…
Sparse matrix-vector multiplication (SpMV) is an essential linear algebra operation that dominates the computing cost in many scientific applications. Due to providing massive parallelism and high memory bandwidth, GPUs are commonly used to…
Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we…
Developing kernels for Processing-In-Memory (PIM) platforms poses unique challenges in data management and parallel programming on limited processing units. Although software development kits (SDKs) for PIM, such as the UPMEM SDK, provide…
Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental computation in graph analytics, scientific simulation, and sparse deep learning workloads. However, the extreme irregularity of real-world sparse matrices prevents existing…
General-purpose Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental kernel in scientific computing and deep learning. The emergence of new matrix computation units such as Tensor Cores (TCs) brings more opportunities for SpMM…