Related papers: Sgap: Towards Efficient Sparse Tensor Algebra Comp…
Sparse tensor algebra is challenging to efficiently parallelize due to the irregular, data-dependent, and potentially skewed structure of sparse computation. We propose the first partitioning algorithm that provably load balances the…
This paper describes REAP, a software-hardware approach that enables high performance sparse linear algebra computations on a cooperative CPU-FPGA platform. REAP carefully separates the task of organizing the matrix elements from the…
General sparse matrix-matrix multiplication (SpGEMM) is a fundamental building block for numerous applications such as algebraic multigrid method (AMG), breadth first search and shortest path problem. Compared to other sparse BLAS routines,…
We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion.…
Sparse general matrix-matrix multiplication (spGEMM) is an essential component in many scientific and data analytics applications. However, the sparsity pattern of the input matrices and the interaction of their patterns make spGEMM…
Sparse data structures are commonly used in neural networks to reduce the memory footprint. These data structures are compact but cause irregularities such as random memory accesses, which prevent efficient use of the memory hierarchy. GPUs…
Dense and sparse tensors allow the representation of most bulk data structures in computational science applications. We show that sparse tensor algebra can also be used to express many of the transformations on these datasets, especially…
In computational science and data analytics, many workloads involve irregular and sparse computations that are inherently difficult to optimize for modern hardware. A key kernel is Sparse General Matrix-Matrix Multiplication (SpGEMM), which…
Sparse tensors are rapidly becoming critical components of modern deep learning workloads. However, developing high-performance sparse operators can be difficult and tedious, and existing vendor libraries cannot satisfy the escalating…
Automated code generation and performance enhancements for sparse tensor algebra have become essential in many real-world applications, such as quantum computing, physical simulations, computational chemistry, and machine learning. General…
Sparse tensors appear in many large-scale applications with multidimensional and sparse data. While multidimensional sparse data often need to be processed on manycore processors, attempts to develop highly-optimized GPU-based…
Sparse General Matrix Multiply (SpGEMM) is key for various High-Performance Computing (HPC) applications such as genomics and graph analytics. Using the semiring abstraction, many algorithms can be formulated as SpGEMM, allowing…
Programming high-performance sparse GPU kernels is notoriously difficult, requiring both substantial effort and deep expertise. Sparse compilers aim to simplify this process, but existing systems fall short in two key ways. First, they are…
Generalized sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. Here we show that SpGEMM also yields efficient…
Although the matrix multiplication plays a vital role in computational linear algebra, there are few efficient solutions for matrix multiplication of the near-sparse matrices. The Sparse Approximate Matrix Multiply (SpAMM) is one of the…
Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental computation in graph analytics, scientific simulation, and sparse deep learning workloads. However, the extreme irregularity of real-world sparse matrices prevents existing…
Sparse Matricized Tensor Times Khatri-Rao Product (spMTTKRP) is the bottleneck kernel of sparse tensor decomposition. In this work, we propose a GPU-based algorithm design to address the key challenges in accelerating spMTTKRP computation,…
General-purpose Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental kernel in scientific computing and deep learning. The emergence of new matrix computation units such as Tensor Cores (TCs) brings more opportunities for SpMM…
Recently, numerous sparse hardware accelerators for Deep Neural Networks (DNNs), Graph Neural Networks (GNNs), and scientific computing applications have been proposed. A common characteristic among all of these accelerators is that they…
Recent years have seen considerable work on compiling sparse tensor algebra expressions. This paper addresses a shortcoming in that work, namely how to generate efficient code (in time and space) that scatters values into a sparse result…