English
Related papers

Related papers: SparseP: Towards Efficient Sparse Matrix Vector Mu…

200 papers

Several manufacturers have already started to commercialize near-bank Processing-In-Memory (PIM) architectures. Near-bank PIM architectures place simple cores close to DRAM banks and can yield significant performance and energy improvements…

Hardware Architecture · Computer Science 2022-04-05 Christina Giannoula , Ivan Fernandez , Juan Gómez-Luna , Nectarios Koziris , Georgios Goumas , Onur Mutlu

Emerging machine learning (ML) models (e.g., transformers) involve memory pin bandwidth-bound matrix-vector (MV) computation in inference. By avoiding pin crossings, processing in memory (PIM) can improve performance and energy for…

Hardware Architecture · Computer Science 2024-04-09 Mingxuan He , Mithuna Thottethodi , T. N. Vijaykumar

Sparse matrix-vector multiplication (SpMV) is a crucial computing kernel with widespread applications in iterative algorithms. Over the past decades, research on SpMV optimization has made remarkable strides, giving rise to various…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-10 Jianhua Gao , Bingjie Liu , Weixing Ji , Hua Huang

Data movement between memory and processors is a major bottleneck in modern computing systems. The processing-in-memory (PIM) paradigm aims to alleviate this bottleneck by performing computation inside memory chips. Real PIM hardware (e.g.,…

Hardware Architecture · Computer Science 2023-10-04 Jinfan Chen , Juan Gómez-Luna , Izzat El Hajj , Yuxin Guo , Onur Mutlu

This paper presents a low-overhead optimizer for the ubiquitous sparse matrix-vector multiplication (SpMV) kernel. Architectural diversity among different processors together with structural diversity among different sparse matrices lead to…

Performance · Computer Science 2017-11-16 Athena Elafrou , Georgios Goumas , Nektarios Koziris

The sparse matrix-vector (SpMV) multiplication is an important computational kernel, but it is notoriously difficult to execute efficiently. This paper investigates algorithm performance for unstructured sparse matrices, which are more…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-27 Kobe Bergmans , Karl Meerbergen , Raf Vandebril

Sparse matrix-vector and matrix-matrix multiplication (SpMV and SpMM) are fundamental in both conventional (graph analytics, scientific computing) and emerging (sparse DNN, GNN) domains. Workload-balancing and parallel-reduction are…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-15 Guyue Huang , Guohao Dai , Yu Wang , Yufei Ding , Yuan Xie

Many modern workloads such as neural network inference and graph processing are fundamentally memory-bound. For such workloads, data movement between memory and CPU cores imposes a significant overhead in terms of both latency and energy. A…

Hardware Architecture · Computer Science 2023-04-04 Juan Gómez-Luna , Izzat El Hajj , Ivan Fernandez , Christina Giannoula , Geraldo F. Oliveira , Onur Mutlu

Cryptographic algorithms such as AES-128 and SHA-256 are fundamental to ensuring data security and integrity. Although these algorithms are computationally efficient, their performance is often constrained by the processor-centric…

Cryptography and Security · Computer Science 2026-05-20 Nicola Barcarolo , Brahmaiah Gandham , Mohammad Sadrosadati , Roberto Passerone , Onur Mutlu , Flavio Vella

This paper discusses recent research that aims to enable computation close to data, an approach we broadly call processing-in-memory (PIM). PIM places computation mechanisms in or near where the data is stored (i.e., inside memory chips or…

Hardware Architecture · Computer Science 2025-02-07 Onur Mutlu , Saugata Ghose , Juan Gómez-Luna , Rachata Ausavarungnirun , Mohammad Sadrosadati , Geraldo F. Oliveira

In this paper, we propose an optimization selection methodology for the ubiquitous sparse matrix-vector multiplication (SpMV) kernel. We propose two models that attempt to identify the major performance bottleneck of the kernel for every…

Performance · Computer Science 2016-01-12 Athena Elafrou , Georgios Goumas , Nectarios Koziris

Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-10 Mehmet Deveci , Christian Trott , Sivasankaran Rajamanickam

We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion.…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-13 Carl Yang , Aydin Buluc , John D. Owens

Sparse matrix vector multiplication (SpMV) is central to numerous data-intensive applications, but requires streaming indirect memory accesses that severely degrade both processing and memory throughput in state-of-the-art architectures.…

Hardware Architecture · Computer Science 2023-11-20 Chi Zhang , Paul Scheffler , Thomas Benz , Matteo Perotti , Luca Benini

We propose different implementations of the sparse matrix--dense vector multiplication (\spmv{}) for finite fields and rings $\Zb/m\Zb$. We take advantage of graphic card processors (GPU) and multi-core architectures. Our aim is to improve…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-09-09 Brice Boyer , Jean-Guillaume Dumas , Pascal Giorgi

Accelerators for sparse matrix multiplication are important components in emerging systems. In this paper, we study the main challenges of accelerating Sparse Matrix Multiplication (SpMM). For the situations that data is not stored in the…

Hardware Architecture · Computer Science 2019-06-04 Pareesa Ameneh Golnari , Sharad Malik

Sparse matrix-vector multiplication (SpMV) is an essential linear algebra operation that dominates the computing cost in many scientific applications. Due to providing massive parallelism and high memory bandwidth, GPUs are commonly used to…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-14 Mina Ashoury , Mohammad Loni , Farshad Khunjush , Masoud Daneshtalab

We design and develop a work-efficient multithreaded algorithm for sparse matrix-sparse vector multiplication (SpMSpV) where the matrix, the input vector, and the output vector are all sparse. SpMSpV is an important primitive in the…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-26 Ariful Azad , Aydin Buluc

Understanding the scalability of parallel programs is crucial for software optimization and hardware architecture design. As HPC hardware is moving towards many-core design, it becomes increasingly difficult for a parallel program to make…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-21 Donglin Chen , Jianbin Fang , Chuanfu Xu , Shizhao Chen , Zheng Wang

Bit-serial Processing-In-Memory (PIM) is an attractive paradigm for accelerator architectures, for parallel workloads such as Deep Learning (DL), because of its capability to achieve massive data parallelism at a low area overhead and…

Hardware Architecture · Computer Science 2023-11-21 Aman Arora , Jian Weng , Siyuan Ma , Tony Nowatzki , Lizy K. John
‹ Prev 1 2 3 10 Next ›