Related papers: Maple: A Processing Element for Row-Wise Product B…

Sparseloop: An Analytical Approach To Sparse Tensor Accelerator Modeling

In recent years, many accelerators have been proposed to efficiently process sparse tensor algebra applications (e.g., sparse neural networks). However, these proposals are single points in a large and diverse design space. The lack of…

Hardware Architecture · Computer Science 2023-01-11 Yannan Nellie Wu , Po-An Tsai , Angshuman Parashar , Vivienne Sze , Joel S. Emer

SparseMap: A Sparse Tensor Accelerator Framework Based on Evolution Strategy

The growing demand for sparse tensor algebra (SpTA) in machine learning and big data has driven the development of various sparse tensor accelerators. However, most existing manually designed accelerators are limited to specific scenarios,…

Machine Learning · Computer Science 2025-08-19 Boran Zhao , Haiming Zhai , Zihang Yuan , Hetian Liu , Tian Xia , Wenzhe Zhao , Pengju Ren

A method for accelerating low precision operations by sparse matrix multiplication

In recent years, the fervent demand for computational power across various domains has prompted hardware manufacturers to introduce specialized computing hardware aimed at enhancing computational capabilities. Particularly, the utilization…

Numerical Analysis · Mathematics 2024-03-12 Hongyaoxing Gu

AsyncSparse: Accelerating Sparse Matrix-Matrix Multiplication on Asynchronous GPU Architectures

Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental kernel across scientific computing and machine learning. While prior work accelerates SpMM using Tensor Cores, no existing sparse kernel exploits the asynchronous features of…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-21 Jie Liu , Huanzhi Pu , Zhiru Zhang

Sparse Matrix Multiplication On An Associative Processor

Sparse matrix multiplication is an important component of linear algebra computations. Implementing sparse matrix multiplication on an associative processor (AP) enables high level of parallelism, where a row of one matrix is multiplied in…

Mathematical Software · Computer Science 2017-05-23 L. Yavits , A. Morad , R. Ginosar

Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights

Machine learning (ML) models are widely used in many important domains. For efficiently processing these computational- and memory-intensive applications, tensors of these over-parameterized models are compressed by leveraging sparsity,…

Hardware Architecture · Computer Science 2021-08-11 Shail Dave , Riyadh Baghdadi , Tony Nowatzki , Sasikanth Avancha , Aviral Shrivastava , Baoxin Li

Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores

General-purpose Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental kernel in scientific computing and deep learning. The emergence of new matrix computation units such as Tensor Cores (TCs) brings more opportunities for SpMM…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-17 Haisha Zhao , San Li , Jiaheng Wang , Chunbao Zhou , Jue Wang , Zhikuang Xin , Shunde Li , Zhiqiang Liang , Zhijie Pan , Fang Liu , Yan Zeng , Yangang Wang , Xuebin Chi

On Parallelizing Matrix Multiplication by the Column-Row Method

We consider the problem of sparse matrix multiplication by the column row method in a distributed setting where the matrix product is not necessarily sparse. We present a surprisingly simple method for "consistent" parallel processing of…

Data Structures and Algorithms · Computer Science 2012-11-20 Andrea Campagna , Konstantin Kutzkov , Rasmus Pagh

Sparse Matrix to Matrix Multiplication: A Representation and Architecture for Acceleration (long version)

Accelerators for sparse matrix multiplication are important components in emerging systems. In this paper, we study the main challenges of accelerating Sparse Matrix Multiplication (SpMM). For the situations that data is not stored in the…

Hardware Architecture · Computer Science 2019-06-04 Pareesa Ameneh Golnari , Sharad Malik

Sparse Tensor Algebra Optimizations with Workspaces

This paper shows how to optimize sparse tensor algebraic expressions by introducing temporary tensors, called workspaces, into the resulting loop nests. We develop a new intermediate language for tensor operations called concrete index…

Mathematical Software · Computer Science 2023-10-18 Fredrik Kjolstad , Willow Ahrens , Shoaib Kamil , Saman Amarasinghe

Accelerating Sparse Deep Neural Networks

As neural network model sizes have dramatically increased, so has the interest in various techniques to reduce their parameter counts and accelerate their execution. An active area of research in this field is sparsity - encouraging zero…

Machine Learning · Computer Science 2021-04-20 Asit Mishra , Jorge Albericio Latorre , Jeff Pool , Darko Stosic , Dusan Stosic , Ganesh Venkatesh , Chong Yu , Paulius Micikevicius

Staging Blocked Evaluation over Structured Sparse Matrices

The matrices used in many computational settings are naturally sparse, holding a small percentage of nonzero elements. Storing such matrices in specialized sparse formats enables algorithms that avoid wasting computation on zeros,…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-13 Pratyush Das , Amirhossein Basareh , Adhitha Dias , Artem Pelenitsyn , Kirshanthan Sundararajah , Milind Kulkarni , Ben Delaware

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

Sparse Matrix-matrix Multiplication (SpMM) and Sampled Dense-dense Matrix Multiplication (SDDMM) are important sparse operators in scientific computing and deep learning. Tensor Core Units (TCUs) enhance modern accelerators with superior…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-17 Jinliang Shi , Shigang Li , Youxuan Xu , Rongtian Fu , Xueying Wang , Tong Wu

Misam: Using ML in Dataflow Selection of Sparse-Sparse Matrix Multiplication

Sparse matrix-matrix multiplication (SpGEMM) is a critical operation in numerous fields, including scientific computing, graph analytics, and deep learning. These applications exploit the sparsity of matrices to reduce storage and…

Machine Learning · Computer Science 2024-08-30 Sanjali Yadav , Bahar Asgari

SMASH: Co-designing Software Compression and Hardware-Accelerated Indexing for Efficient Sparse Matrix Operations

Important workloads, such as machine learning and graph analytics applications, heavily involve sparse linear algebra operations. These operations use sparse matrix compression as an effective means to avoid storing zeros and performing…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-25 Konstantinos Kanellopoulos , Nandita Vijaykumar , Christina Giannoula , Roknoddin Azizi , Skanda Koppula , Nika Mansouri Ghiasi , Taha Shahroodi , Juan Gomez Luna , Onur Mutlu

IOPS: An Unified SpMM Accelerator Based on Inner-Outer-Hybrid Product

Sparse matrix multiplication (SpMM) is widely applied to numerous domains, such as graph processing, machine learning, and data analytics. However, inner product based SpMM induces redundant zero-element computing for mismatched nonzero…

Hardware Architecture · Computer Science 2023-12-21 Wenhao Sun , Wendi Sun , Song Chen , Yi Kang

XMoE: Sparse Models with Fine-grained and Adaptive Expert Selection

Sparse models, including sparse Mixture-of-Experts (MoE) models, have emerged as an effective approach for scaling Transformer models. However, they often suffer from computational inefficiency since a significant number of parameters are…

Machine Learning · Computer Science 2024-05-27 Yuanhang Yang , Shiyi Qi , Wenchao Gu , Chaozheng Wang , Cuiyun Gao , Zenglin Xu

Parallel structurally-symmetric sparse matrix-vector products on multi-core processors

We consider the problem of developing an efficient multi-threaded implementation of the matrix-vector multiplication algorithm for sparse matrices with structural symmetry. Matrices are stored using the compressed sparse row-column format…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-05-18 Vicente H. F. Batista , George O. Ainsworth , Fernando L. B. Ribeiro

Efficient Distributed-Memory Parallel Matrix-Vector Multiplication with Wide or Tall Unstructured Sparse Matrices

This paper presents an efficient technique for matrix-vector and vector-transpose-matrix multiplication in distributed-memory parallel computing environments, where the matrices are unstructured, sparse, and have a substantially larger…

Mathematical Software · Computer Science 2018-12-04 Jonathan Eckstein , Gyorgy Matyasfalvi

Sparse Partial-Tracing

Matrices and more generally multidimensional arrays, form the backbone of computational studies. In this paper we demonstrate increases in computational efficiency by performing partial-tracing/tensor-contractions on sparse-arrays. It was…

Data Structures and Algorithms · Computer Science 2023-03-21 Julio Candanedo