Related papers: Sgap: Towards Efficient Sparse Tensor Algebra Comp…

Partitioning Unstructured Sparse Tensor Algebra for Load-Balanced Parallel Execution

Sparse tensor algebra is challenging to efficiently parallelize due to the irregular, data-dependent, and potentially skewed structure of sparse computation. We propose the first partitioning algorithm that provably load balances the…

Programming Languages · Computer Science 2026-04-23 Atharva Chougule , Alexander J Root , Rubens Lacouture , Bobby Yan , Rohan Yadav , Fredrik Kjolstad

Synergistic CPU-FPGA Acceleration of Sparse Linear Algebra

This paper describes REAP, a software-hardware approach that enables high performance sparse linear algebra computations on a cooperative CPU-FPGA platform. REAP carefully separates the task of organizing the matrix elements from the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-30 Mohammadreza Soltaniyeh , Richard P. Martin , Santosh Nagarakatte

A Framework for General Sparse Matrix-Matrix Multiplication on GPUs and Heterogeneous Processors

General sparse matrix-matrix multiplication (SpGEMM) is a fundamental building block for numerous applications such as algebraic multigrid method (AMG), breadth first search and shortest path problem. Compared to other sparse BLAS routines,…

Mathematical Software · Computer Science 2015-09-15 Weifeng Liu , Brian Vinter

Design Principles for Sparse Matrix Multiplication on the GPU

We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion.…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-13 Carl Yang , Aydin Buluc , John D. Owens

Accelerating Sparse Matrix-Matrix Multiplication with GPU Tensor Cores

Sparse general matrix-matrix multiplication (spGEMM) is an essential component in many scientific and data analytics applications. However, the sparsity pattern of the input matrices and the interaction of their patterns make spGEMM…

Mathematical Software · Computer Science 2020-10-01 Orestis Zachariadis , Nitin Satpute , Juan Gómez-Luna , Joaquín Olivares

A Novel Compiler Transformation for Fast Sparse Matrix Multiplication in GPUs

Sparse data structures are commonly used in neural networks to reduce the memory footprint. These data structures are compact but cause irregularities such as random memory accesses, which prevent efficient use of the memory hierarchy. GPUs…

Programming Languages · Computer Science 2025-06-19 Hossein Albakri , Kazem Cheshmi

Sparse Tensor Algebra as a Parallel Programming Model

Dense and sparse tensors allow the representation of most bulk data structures in computational science applications. We show that sparse tensor algebra can also be used to express many of the transformations on these datasets, especially…

Mathematical Software · Computer Science 2015-12-02 Edgar Solomonik , Torsten Hoefler

Ocean: Fast Estimation-Based Sparse General Matrix-Matrix Multiplication on GPU

In computational science and data analytics, many workloads involve irregular and sparse computations that are inherently difficult to optimize for modern hardware. A key kernel is Sparse General Matrix-Matrix Multiplication (SpGEMM), which…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-22 Yifan Li , Giulia Guidi

SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning

Sparse tensors are rapidly becoming critical components of modern deep learning workloads. However, developing high-performance sparse operators can be difficult and tedious, and existing vendor libraries cannot satisfy the escalating…

Machine Learning · Computer Science 2023-02-22 Zihao Ye , Ruihang Lai , Junru Shao , Tianqi Chen , Luis Ceze

SparseAuto: An Auto-Scheduler for Sparse Tensor Computations Using Recursive Loop Nest Restructuring

Automated code generation and performance enhancements for sparse tensor algebra have become essential in many real-world applications, such as quantum computing, physical simulations, computational chemistry, and machine learning. General…

Programming Languages · Computer Science 2024-08-20 Adhitha Dias , Logan Anderson , Kirshanthan Sundararajah , Artem Pelenitsyn , Milind Kulkarni

A Unified Optimization Approach for Sparse Tensor Operations on GPUs

Sparse tensors appear in many large-scale applications with multidimensional and sparse data. While multidimensional sparse data often need to be processed on manycore processors, attempts to develop highly-optimized GPU-based…

Mathematical Software · Computer Science 2017-12-18 Bangtian Liu , Chengyao Wen , Anand D. Sarwate , Maryam Mehri Dehnavi

Parallel GPU-Enabled Algorithms for SpGEMM on Arbitrary Semirings with Hybrid Communication

Sparse General Matrix Multiply (SpGEMM) is key for various High-Performance Computing (HPC) applications such as genomics and graph analytics. Using the semiring abstraction, many algorithms can be formulated as SpGEMM, allowing…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-23 Thomas McFarland , Julian Bellavita , Giulia Guidi

Insum: Sparse GPU Kernels Simplified and Optimized with Indirect Einsums

Programming high-performance sparse GPU kernels is notoriously difficult, requiring both substantial effort and deep expertise. Sparse compilers aim to simplify this process, but existing systems fall short in two key ways. First, they are…

Programming Languages · Computer Science 2025-10-21 Jaeyeon Won , Willow Ahrens , Joel S. Emer , Saman Amarasinghe

Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments

Generalized sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. Here we show that SpGEMM also yields efficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-03-19 Aydin Buluc , John Gilbert

Accelerating Sparse Approximate Matrix Multiplication on GPUs

Although the matrix multiplication plays a vital role in computational linear algebra, there are few efficient solutions for matrix multiplication of the near-sparse matrices. The Sparse Approximate Matrix Multiply (SpAMM) is one of the…

Performance · Computer Science 2022-10-25 Xiaoyan Liu , Yi Liu , Ming Dun , Bohong Yin , Hailong Yang , Zhongzhi Luan , Depei Qian

RSH-SpMM: A Row-Structured Hybrid Kernel for Sparse Matrix-Matrix Multiplication on GPUs

Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental computation in graph analytics, scientific simulation, and sparse deep learning workloads. However, the extreme irregularity of real-world sparse matrices prevents existing…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-11 Aiying Li , Jingwei Sun , Han Li , Wence Ji , Guangzhong Sun

Sparse MTTKRP Acceleration for Tensor Decomposition on GPU

Sparse Matricized Tensor Times Khatri-Rao Product (spMTTKRP) is the bottleneck kernel of sparse tensor decomposition. In this work, we propose a GPU-based algorithm design to address the key challenges in accelerating spMTTKRP computation,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-15 Sasindu Wijeratne , Rajgopal Kannan , Viktor Prasanna

Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores

General-purpose Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental kernel in scientific computing and deep learning. The emergence of new matrix computation units such as Tensor Cores (TCs) brings more opportunities for SpMM…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-17 Haisha Zhao , San Li , Jiaheng Wang , Chunbao Zhou , Jue Wang , Zhikuang Xin , Shunde Li , Zhiqiang Liang , Zhijie Pan , Fang Liu , Yan Zeng , Yangang Wang , Xuebin Chi

Enabling Flexibility for Sparse Tensor Acceleration via Heterogeneity

Recently, numerous sparse hardware accelerators for Deep Neural Networks (DNNs), Graph Neural Networks (GNNs), and scientific computing applications have been proposed. A common characteristic among all of these accelerators is that they…

Hardware Architecture · Computer Science 2022-01-25 Eric Qin , Raveesh Garg , Abhimanyu Bambhaniya , Michael Pellauer , Angshuman Parashar , Sivasankaran Rajamanickam , Cong Hao , Tushar Krishna

Compilation of Modular and General Sparse Workspaces

Recent years have seen considerable work on compiling sparse tensor algebra expressions. This paper addresses a shortcoming in that work, namely how to generate efficient code (in time and space) that scatters values into a sparse result…

Programming Languages · Computer Science 2024-04-09 Genghan Zhang , Olivia Hsu , Fredrik Kjolstad