Related papers: Optimizing Structured-Sparse Matrix Multiplication…
Structured sparsity has been proposed as an efficient way to prune the complexity of modern Machine Learning (ML) applications and to simplify the handling of sparse data in hardware. The acceleration of ML models - for both training and…
RISC-V CPUs leverage the RVV (RISC-V Vector) extension to accelerate data-parallel workloads. In addition to arithmetic operations, RVV includes powerful permutation instructions that enable flexible element rearrangement within vector…
This paper presents a low-overhead optimizer for the ubiquitous sparse matrix-vector multiplication (SpMV) kernel. Architectural diversity among different processors together with structural diversity among different sparse matrices lead to…
Sparse matrix-vector multiplication (SpMV) is a crucial computing kernel with widespread applications in iterative algorithms. Over the past decades, research on SpMV optimization has made remarkable strides, giving rise to various…
The growing computational demands of machine learning (ML) workloads have driven the design of ML accelerators aiming at an optimal tradeoff between efficiency and flexibility. A widely explored architecture for flexible ML accelerators is…
Sparse computations frequently appear in scientific simulations and the performance of these simulations rely heavily on the optimization of the sparse codes. The compact data structures and irregular computation patterns in sparse matrix…
We design and develop a work-efficient multithreaded algorithm for sparse matrix-sparse vector multiplication (SpMSpV) where the matrix, the input vector, and the output vector are all sparse. SpMSpV is an important primitive in the…
The sparse matrix-vector (SpMV) multiplication is an important computational kernel, but it is notoriously difficult to execute efficiently. This paper investigates algorithm performance for unstructured sparse matrices, which are more…
In this paper, we propose an optimization selection methodology for the ubiquitous sparse matrix-vector multiplication (SpMV) kernel. We propose two models that attempt to identify the major performance bottleneck of the kernel for every…
The rapid development of RISC-V instruction set architecture presents new opportunities and challenges for software developers. Is it sufficient to simply recompile high-performance software optimized for x86-64 onto RISC-V CPUs? Are…
Accelerators for sparse matrix multiplication are important components in emerging systems. In this paper, we study the main challenges of accelerating Sparse Matrix Multiplication (SpMM). For the situations that data is not stored in the…
We introduce an algorithm for efficiently representing convolution with zero-padding and stride as a sparse transformation matrix, applied to a vectorized input through sparse matrix-vector multiplication (SpMV). We provide a theoretical…
This paper addresses spatial programming of sparse matrix computations for productive performance. The challenge is how to express an irregular computation and its optimizations in a regular way. A sparse matrix has (non-zero) values and a…
We consider the problem of developing an efficient multi-threaded implementation of the matrix-vector multiplication algorithm for sparse matrices with structural symmetry. Matrices are stored using the compressed sparse row-column format…
Merge sort as a divide-sort-merge paradigm has been widely applied in computer science fields. As modern reduced instruction set computing architectures like the fifth generation (RISC-V) regard multiple registers as a vector register group…
RISC-V provides a flexible and scalable platform for applications ranging from embedded devices to high-performance computing clusters. Particularly, its RISC-V Vector Extension (RVV) becomes of interest for the acceleration of AI…
This paper presents a comprehensive analysis of the RISC-V instruction set architecture, focusing on its modular design, implementation challenges, and performance characteristics. We examine the RV32I base instruction set with extensions…
Sparse-dense linear algebra is crucial in many domains, but challenging to handle efficiently on CPUs, GPUs, and accelerators alike; multiplications with sparse formats like CSR and CSF require indirect memory lookups. In this work, we…
This work evaluates the impact of sparse matrix reordering on the performance of sparse matrix-vector multiplication across different multicore CPU platforms. Reordering can significantly enhance performance by optimizing the non-zero…
This report makes the case that a well-designed Reduced Instruction Set Computer (RISC) can match, and even exceed, the performance and code density of existing commercial Complex Instruction Set Computers (CISC) while maintaining the…