English

Blocking Techniques for Sparse Matrix Multiplication on Tensor Accelerators

Distributed, Parallel, and Cluster Computing 2022-02-15 v1

Abstract

Tensor accelerators have gained popularity because they provide a cheap and efficient solution for speeding up computational-expensive tasks in Deep Learning and, more recently, in other Scientific Computing applications. However, since their features are specifically designed for tensor algebra (typically dense matrix-product), it is commonly assumed that they are not suitable for applications with sparse data. To challenge this viewpoint, we discuss methods and present solutions for accelerating sparse matrix multiplication on such architectures. In particular, we present a 1-dimensional blocking algorithm with theoretical guarantees on the density, which builds dense blocks from arbitrary sparse matrices. Experimental results show that, even for unstructured and highly-sparse matrices, our block-based solution which exploits Nvidia Tensor Cores is faster than its sparse counterpart. We observed significant speed-ups of up to two orders of magnitude on real-world sparse matrices.

Keywords

Cite

@article{arxiv.2202.05868,
  title  = {Blocking Techniques for Sparse Matrix Multiplication on Tensor Accelerators},
  author = {Paolo Sylos Labini and Massimo Bernaschi and Francesco Silvestri and Flavio Vella},
  journal= {arXiv preprint arXiv:2202.05868},
  year   = {2022}
}

Comments

12 pages, 14 images

R2 v1 2026-06-24T09:32:46.028Z