English
Related papers

Related papers: Optimizing Structured-Sparse Matrix Multiplication…

200 papers

Structured sparsity has been proposed as an efficient way to prune the complexity of modern Machine Learning (ML) applications and to simplify the handling of sparse data in hardware. The acceleration of ML models - for both training and…

Hardware Architecture · Computer Science 2023-11-14 V. Titopoulos , K. Alexandridis , C. Peltekis , C. Nicopoulos , G. Dimitrakopoulos

RISC-V CPUs leverage the RVV (RISC-V Vector) extension to accelerate data-parallel workloads. In addition to arithmetic operations, RVV includes powerful permutation instructions that enable flexible element rearrangement within vector…

Hardware Architecture · Computer Science 2025-06-02 Vasileios Titopoulos , George Alexakis , Chrysostomos Nicopoulos , Giorgos Dimitrakopoulos

This paper presents a low-overhead optimizer for the ubiquitous sparse matrix-vector multiplication (SpMV) kernel. Architectural diversity among different processors together with structural diversity among different sparse matrices lead to…

Performance · Computer Science 2017-11-16 Athena Elafrou , Georgios Goumas , Nektarios Koziris

Sparse matrix-vector multiplication (SpMV) is a crucial computing kernel with widespread applications in iterative algorithms. Over the past decades, research on SpMV optimization has made remarkable strides, giving rise to various…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-10 Jianhua Gao , Bingjie Liu , Weixing Ji , Hua Huang

The growing computational demands of machine learning (ML) workloads have driven the design of ML accelerators aiming at an optimal tradeoff between efficiency and flexibility. A widely explored architecture for flexible ML accelerators is…

Hardware Architecture · Computer Science 2025-06-13 Luca Colagrande , Lorenzo Leone , Maximilian Coco , Andrei Deaconeasa , Luca Benini

Sparse computations frequently appear in scientific simulations and the performance of these simulations rely heavily on the optimization of the sparse codes. The compact data structures and irregular computation patterns in sparse matrix…

Programming Languages · Computer Science 2021-12-10 Zachary Cetinic , Kazem Cheshmi , Maryam Mehri Dehnavi

We design and develop a work-efficient multithreaded algorithm for sparse matrix-sparse vector multiplication (SpMSpV) where the matrix, the input vector, and the output vector are all sparse. SpMSpV is an important primitive in the…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-26 Ariful Azad , Aydin Buluc

The sparse matrix-vector (SpMV) multiplication is an important computational kernel, but it is notoriously difficult to execute efficiently. This paper investigates algorithm performance for unstructured sparse matrices, which are more…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-27 Kobe Bergmans , Karl Meerbergen , Raf Vandebril

In this paper, we propose an optimization selection methodology for the ubiquitous sparse matrix-vector multiplication (SpMV) kernel. We propose two models that attempt to identify the major performance bottleneck of the kernel for every…

Performance · Computer Science 2016-01-12 Athena Elafrou , Georgios Goumas , Nectarios Koziris

The rapid development of RISC-V instruction set architecture presents new opportunities and challenges for software developers. Is it sufficient to simply recompile high-performance software optimized for x86-64 onto RISC-V CPUs? Are…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-17 Anna Pirova , Anastasia Vodeneeva , Konstantin Kovalev , Alexander Ustinov , Evgeny Kozinov , Alexey Liniov , Valentin Volokitin , Iosif Meyerov

Accelerators for sparse matrix multiplication are important components in emerging systems. In this paper, we study the main challenges of accelerating Sparse Matrix Multiplication (SpMM). For the situations that data is not stored in the…

Hardware Architecture · Computer Science 2019-06-04 Pareesa Ameneh Golnari , Sharad Malik

We introduce an algorithm for efficiently representing convolution with zero-padding and stride as a sparse transformation matrix, applied to a vectorized input through sparse matrix-vector multiplication (SpMV). We provide a theoretical…

Machine Learning · Computer Science 2024-12-02 Zan Chaudhry

This paper addresses spatial programming of sparse matrix computations for productive performance. The challenge is how to express an irregular computation and its optimizations in a regular way. A sparse matrix has (non-zero) values and a…

Mathematical Software · Computer Science 2018-10-18 Hongbo Rong

We consider the problem of developing an efficient multi-threaded implementation of the matrix-vector multiplication algorithm for sparse matrices with structural symmetry. Matrices are stored using the compressed sparse row-column format…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-05-18 Vicente H. F. Batista , George O. Ainsworth , Fernando L. B. Ribeiro

Merge sort as a divide-sort-merge paradigm has been widely applied in computer science fields. As modern reduced instruction set computing architectures like the fifth generation (RISC-V) regard multiple registers as a vector register group…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-02 Jin Zhang , Jincheng Zhou , Xiang Zhang , Di Ma , Chunye Gong

RISC-V provides a flexible and scalable platform for applications ranging from embedded devices to high-performance computing clusters. Particularly, its RISC-V Vector Extension (RVV) becomes of interest for the acceleration of AI…

Machine Learning · Computer Science 2025-08-20 Federico Nicolas Peccia , Frederik Haxel , Oliver Bringmann

This paper presents a comprehensive analysis of the RISC-V instruction set architecture, focusing on its modular design, implementation challenges, and performance characteristics. We examine the RV32I base instruction set with extensions…

Hardware Architecture · Computer Science 2025-06-10 Priyanshu Yadav

Sparse-dense linear algebra is crucial in many domains, but challenging to handle efficiently on CPUs, GPUs, and accelerators alike; multiplications with sparse formats like CSR and CSF require indirect memory lookups. In this work, we…

Hardware Architecture · Computer Science 2020-12-15 Paul Scheffler , Florian Zaruba , Fabian Schuiki , Torsten Hoefler , Luca Benini

This work evaluates the impact of sparse matrix reordering on the performance of sparse matrix-vector multiplication across different multicore CPU platforms. Reordering can significantly enhance performance by optimizing the non-zero…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-23 Omid Asudeh , Sina Mahdipour Saravani , Gerald Sabin , Fabrice Rastello , P Sadayappan

This report makes the case that a well-designed Reduced Instruction Set Computer (RISC) can match, and even exceed, the performance and code density of existing commercial Complex Instruction Set Computers (CISC) while maintaining the…

Hardware Architecture · Computer Science 2016-07-11 Christopher Celio , Palmer Dabbelt , David A. Patterson , Krste Asanović
‹ Prev 1 2 3 10 Next ›