Related papers: SparseP: Towards Efficient Sparse Matrix Vector Mu…

Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems

Several manufacturers have already started to commercialize near-bank Processing-In-Memory (PIM) architectures. Near-bank PIM architectures place simple cores close to DRAM banks and can yield significant performance and energy improvements…

Hardware Architecture · Computer Science 2022-04-05 Christina Giannoula , Ivan Fernandez , Juan Gómez-Luna , Nectarios Koziris , Georgios Goumas , Onur Mutlu

Efficient Sparse Processing-in-Memory Architecture (ESPIM) for Machine Learning Inference

Emerging machine learning (ML) models (e.g., transformers) involve memory pin bandwidth-bound matrix-vector (MV) computation in inference. By avoiding pin crossings, processing in memory (PIM) can improve performance and energy for…

Hardware Architecture · Computer Science 2024-04-09 Mingxuan He , Mithuna Thottethodi , T. N. Vijaykumar

A Systematic Literature Survey of Sparse Matrix-Vector Multiplication

Sparse matrix-vector multiplication (SpMV) is a crucial computing kernel with widespread applications in iterative algorithms. Over the past decades, research on SpMV optimization has made remarkable strides, giving rise to various…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-10 Jianhua Gao , Bingjie Liu , Weixing Ji , Hua Huang

SimplePIM: A Software Framework for Productive and Efficient Processing-in-Memory

Data movement between memory and processors is a major bottleneck in modern computing systems. The processing-in-memory (PIM) paradigm aims to alleviate this bottleneck by performing computation inside memory chips. Real PIM hardware (e.g.,…

Hardware Architecture · Computer Science 2023-10-04 Jinfan Chen , Juan Gómez-Luna , Izzat El Hajj , Yuxin Guo , Onur Mutlu

Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Modern Multi- and Many-Core Processors

This paper presents a low-overhead optimizer for the ubiquitous sparse matrix-vector multiplication (SpMV) kernel. Architectural diversity among different processors together with structural diversity among different sparse matrices lead to…

Performance · Computer Science 2017-11-16 Athena Elafrou , Georgios Goumas , Nektarios Koziris

Algorithms for Parallel Shared-Memory Sparse Matrix-Vector Multiplication on Unstructured Matrices

The sparse matrix-vector (SpMV) multiplication is an important computational kernel, but it is notoriously difficult to execute efficiently. This paper investigates algorithm performance for unstructured sparse matrices, which are more…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-27 Kobe Bergmans , Karl Meerbergen , Raf Vandebril

Efficient Sparse Matrix Kernels based on Adaptive Workload-Balancing and Parallel-Reduction

Sparse matrix-vector and matrix-matrix multiplication (SpMV and SpMM) are fundamental in both conventional (graph analytics, scientific computing) and emerging (sparse DNN, GNN) domains. Workload-balancing and parallel-reduction are…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-15 Guyue Huang , Guohao Dai , Yu Wang , Yufei Ding , Yuan Xie

Benchmarking Memory-Centric Computing Systems: Analysis of Real Processing-in-Memory Hardware

Many modern workloads such as neural network inference and graph processing are fundamentally memory-bound. For such workloads, data movement between memory and CPU cores imposes a significant overhead in terms of both latency and energy. A…

Hardware Architecture · Computer Science 2023-04-04 Juan Gómez-Luna , Izzat El Hajj , Ivan Fernandez , Christina Giannoula , Geraldo F. Oliveira , Onur Mutlu

Taking Cryptography Out of the Data Path via Near-Memory Processing in DRAM

Cryptographic algorithms such as AES-128 and SHA-256 are fundamental to ensuring data security and integrity. Although these algorithms are computationally efficient, their performance is often constrained by the processor-centric…

Cryptography and Security · Computer Science 2026-05-20 Nicola Barcarolo , Brahmaiah Gandham , Mohammad Sadrosadati , Roberto Passerone , Onur Mutlu , Flavio Vella

A Modern Primer on Processing in Memory

This paper discusses recent research that aims to enable computation close to data, an approach we broadly call processing-in-memory (PIM). PIM places computation mechanisms in or near where the data is stored (i.e., inside memory chips or…

Hardware Architecture · Computer Science 2025-02-07 Onur Mutlu , Saugata Ghose , Juan Gómez-Luna , Rachata Ausavarungnirun , Mohammad Sadrosadati , Geraldo F. Oliveira

A lightweight optimization selection method for Sparse Matrix-Vector Multiplication

In this paper, we propose an optimization selection methodology for the ubiquitous sparse matrix-vector multiplication (SpMV) kernel. We propose two models that attempt to identify the major performance bottleneck of the kernel for every…

Performance · Computer Science 2016-01-12 Athena Elafrou , Georgios Goumas , Nectarios Koziris

Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures

Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-10 Mehmet Deveci , Christian Trott , Sivasankaran Rajamanickam

Design Principles for Sparse Matrix Multiplication on the GPU

We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion.…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-13 Carl Yang , Aydin Buluc , John D. Owens

Near-Memory Parallel Indexing and Coalescing: Enabling Highly Efficient Indirect Access for SpMV

Sparse matrix vector multiplication (SpMV) is central to numerous data-intensive applications, but requires streaming indirect memory accesses that severely degrade both processing and memory throughput in state-of-the-art architectures.…

Hardware Architecture · Computer Science 2023-11-20 Chi Zhang , Paul Scheffler , Thomas Benz , Matteo Perotti , Luca Benini

Exact Sparse Matrix-Vector Multiplication on GPU's and Multicore Architectures

We propose different implementations of the sparse matrix--dense vector multiplication (\spmv{}) for finite fields and rings $\Zb/m\Zb$. We take advantage of graphic card processors (GPU) and multi-core architectures. Our aim is to improve…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-09-09 Brice Boyer , Jean-Guillaume Dumas , Pascal Giorgi

Sparse Matrix to Matrix Multiplication: A Representation and Architecture for Acceleration (long version)

Accelerators for sparse matrix multiplication are important components in emerging systems. In this paper, we study the main challenges of accelerating Sparse Matrix Multiplication (SpMM). For the situations that data is not stored in the…

Hardware Architecture · Computer Science 2019-06-04 Pareesa Ameneh Golnari , Sharad Malik

Auto-SpMV: Automated Optimizing SpMV Kernels on GPU

Sparse matrix-vector multiplication (SpMV) is an essential linear algebra operation that dominates the computing cost in many scientific applications. Due to providing massive parallelism and high memory bandwidth, GPUs are commonly used to…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-14 Mina Ashoury , Mohammad Loni , Farshad Khunjush , Masoud Daneshtalab

A work-efficient parallel sparse matrix-sparse vector multiplication algorithm

We design and develop a work-efficient multithreaded algorithm for sparse matrix-sparse vector multiplication (SpMSpV) where the matrix, the input vector, and the output vector are all sparse. SpMSpV is an important primitive in the…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-26 Ariful Azad , Aydin Buluc

Characterizing Scalability of Sparse Matrix-Vector Multiplications on Phytium FT-2000+ Many-cores

Understanding the scalability of parallel programs is crucial for software optimization and hardware architecture design. As HPC hardware is moving towards many-core design, it becomes increasingly difficult for a parallel program to make…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-21 Donglin Chen , Jianbin Fang , Chuanfu Xu , Shizhao Chen , Zheng Wang

PIMSAB: A Processing-In-Memory System with Spatially-Aware Communication and Bit-Serial-Aware Computation

Bit-serial Processing-In-Memory (PIM) is an attractive paradigm for accelerator architectures, for parallel workloads such as Deep Learning (DL), because of its capability to achieve massive data parallelism at a low area overhead and…

Hardware Architecture · Computer Science 2023-11-21 Aman Arora , Jian Weng , Siyuan Ma , Tony Nowatzki , Lizy K. John