Related papers: Optimized Sparse Matrix Operations for Reverse Mod…

Sparse GPU Kernels for Deep Learning

Scientific workloads have traditionally exploited high levels of sparsity to accelerate computation and reduce memory requirements. While deep neural networks can be made sparse, achieving practical speedups on GPUs is difficult because…

Machine Learning · Computer Science 2020-09-02 Trevor Gale , Matei Zaharia , Cliff Young , Erich Elsen

Indirection Stream Semantic Register Architecture for Efficient Sparse-Dense Linear Algebra

Sparse-dense linear algebra is crucial in many domains, but challenging to handle efficiently on CPUs, GPUs, and accelerators alike; multiplications with sparse formats like CSR and CSF require indirect memory lookups. In this work, we…

Hardware Architecture · Computer Science 2020-12-15 Paul Scheffler , Florian Zaruba , Fabian Schuiki , Torsten Hoefler , Luca Benini

RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs

Sparse matrix multiplication is an important kernel for large-scale graph processing and other data-intensive applications. In this paper, we implement various asynchronous, RDMA-based sparse times dense (SpMM) and sparse times sparse…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-06 Benjamin Brock , Aydın Buluç , Katherine Yelick

Adaptive Row-grouped CSR Format for Storing of Sparse Matrices on GPU

We present new adaptive format for storing sparse matrices on GPU. We compare it with several other formats including CUSPARSE which is today probably the best choice for processing of sparse matrices on GPU in CUDA. Contrary to CUSPARSE…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-03-27 Martin Heller , Tomáš Oberhuber

Sparse Matrix Multiplication on CAM Based Accelerator

Sparse matrix multiplication is an important component of linear algebra computations. In this paper, an architecture based on Content Addressable Memory (CAM) and Resistive Content Addressable Memory (ReCAM) is proposed for accelerating…

Hardware Architecture · Computer Science 2017-05-30 Leonid Yavits , Ran Ginosar

Scorch: A Library for Sparse Deep Learning

The rapid growth in the size of deep learning models strains the capabilities of traditional dense computation paradigms. Leveraging sparse computation has become increasingly popular for training and deploying large-scale models, but…

Machine Learning · Computer Science 2024-06-21 Bobby Yan , Alexander J. Root , Trevor Gale , David Broman , Fredrik Kjolstad

A method for accelerating low precision operations by sparse matrix multiplication

In recent years, the fervent demand for computational power across various domains has prompted hardware manufacturers to introduce specialized computing hardware aimed at enhancing computational capabilities. Particularly, the utilization…

Numerical Analysis · Mathematics 2024-03-12 Hongyaoxing Gu

Convex block-sparse linear regression with expanders -- provably

Sparse matrices are favorable objects in machine learning and optimization. When such matrices are used, in place of dense ones, the overall complexity requirements in optimization can be significantly reduced in practice, both in terms of…

Information Theory · Computer Science 2016-04-05 Anastasios Kyrillidis , Bubacarr Bah , Rouzbeh Hasheminezhad , Quoc Tran-Dinh , Luca Baldassarre , Volkan Cevher

Blocking Techniques for Sparse Matrix Multiplication on Tensor Accelerators

Tensor accelerators have gained popularity because they provide a cheap and efficient solution for speeding up computational-expensive tasks in Deep Learning and, more recently, in other Scientific Computing applications. However, since…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-15 Paolo Sylos Labini , Massimo Bernaschi , Francesco Silvestri , Flavio Vella

Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining

Scaling up the sparse matrix-vector multiplication kernel on modern Graphics Processing Units (GPU) has been at the heart of numerous studies in both academia and industry. In this article we present a novel non-parametric, self-tunable,…

Numerical Analysis · Computer Science 2012-12-24 Xintian Yang , Srinivasan Parthasarathy , Ponnuswamy Sadayappan

Design Principles for Sparse Matrix Multiplication on the GPU

We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion.…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-13 Carl Yang , Aydin Buluc , John D. Owens

Multiresolution kernel matrix algebra

We propose a sparse algebra for samplet compressed kernel matrices, to enable efficient scattered data analysis. We show the compression of kernel matrices by means of samplets produces optimally sparse matrices in a certain S-format. It…

Numerical Analysis · Mathematics 2023-05-05 H. Harbrecht , M. Multerer , O. Schenk , Ch. Schwab

dCSR: A Memory-Efficient Sparse Matrix Representation for Parallel Neural Network Inference

Reducing the memory footprint of neural networks is a crucial prerequisite for deploying them in small and low-cost embedded devices. Network parameters can often be reduced significantly through pruning. We discuss how to best represent…

Data Structures and Algorithms · Computer Science 2021-11-25 Elias Trommer , Bernd Waschneck , Akash Kumar

On Parallel Solution of Sparse Triangular Linear Systems in CUDA

The acceleration of sparse matrix computations on modern many-core processors, such as the graphics processing units (GPUs), has been recognized and studied over a decade. Significant performance enhancements have been achieved for many…

Mathematical Software · Computer Science 2017-10-16 Ruipeng Li

Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation

Many large-scale systems rely on high-quality deep representations (embeddings) to facilitate tasks like retrieval, search, and generative modeling. Matryoshka Representation Learning (MRL) recently emerged as a solution for adaptive…

Machine Learning · Computer Science 2025-05-21 Tiansheng Wen , Yifei Wang , Zequn Zeng , Zhong Peng , Yudi Su , Xinyang Liu , Bo Chen , Hongwei Liu , Stefanie Jegelka , Chenyu You

MSREP: A Fast yet Light Sparse Matrix Framework for Multi-GPU Systems

Sparse linear algebra kernels play a critical role in numerous applications, covering from exascale scientific simulation to large-scale data analytics. Offloading linear algebra kernels on one GPU will no longer be viable in these…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-19 Jieyang Chen , Chenhao Xie , Jesun S Firoz , Jiajia Li , Shuaiwen Leon Song , Kevin Barker , Mark Raugas , Ang Li

CUR from a Sparse Optimization Viewpoint

The CUR decomposition provides an approximation of a matrix $X$ that has low reconstruction error and that is sparse in the sense that the resulting approximation lies in the span of only a few columns of $X$. In this regard, it appears to…

Data Structures and Algorithms · Computer Science 2010-11-02 Jacob Bien , Ya Xu , Michael W. Mahoney

Increasing the Efficiency of Sparse Matrix-Matrix Multiplication with a 2.5D Algorithm and One-Sided MPI

Matrix-matrix multiplication is a basic operation in linear algebra and an essential building block for a wide range of algorithms in various scientific fields. Theory and implementation for the dense, square matrix case are well-developed.…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-06-01 Alfio Lazzaro , Joost VandeVondele , Juerg Hutter , Ole Schuett

A Novel Compiler Transformation for Fast Sparse Matrix Multiplication in GPUs

Sparse data structures are commonly used in neural networks to reduce the memory footprint. These data structures are compact but cause irregularities such as random memory accesses, which prevent efficient use of the memory hierarchy. GPUs…

Programming Languages · Computer Science 2025-06-19 Hossein Albakri , Kazem Cheshmi

Expressing Sparse Matrix Computations for Productive Performance on Spatial Architectures

This paper addresses spatial programming of sparse matrix computations for productive performance. The challenge is how to express an irregular computation and its optimizations in a regular way. A sparse matrix has (non-zero) values and a…

Mathematical Software · Computer Science 2018-10-18 Hongbo Rong