Related papers: TorchSparse++: Efficient Training and Inference Fr…

TorchSparse: Efficient Point Cloud Inference Engine

Deep learning on point clouds has received increased attention thanks to its wide applications in AR/VR and autonomous driving. These applications require low latency and high accuracy to provide real-time user experience and ensure user…

Machine Learning · Computer Science 2022-04-22 Haotian Tang , Zhijian Liu , Xiuyu Li , Yujun Lin , Song Han

SparseTransX: Efficient Training of Translation-Based Knowledge Graph Embeddings Using Sparse Matrix Operations

Knowledge graph (KG) learning offers a powerful framework for generating new knowledge and making inferences. Training KG embedding can take a significantly long time, especially for larger datasets. Our analysis shows that the gradient…

Machine Learning · Computer Science 2025-05-01 Md Saidul Hoque Anik , Ariful Azad

SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning

Sparse tensors are rapidly becoming critical components of modern deep learning workloads. However, developing high-performance sparse operators can be difficult and tedious, and existing vendor libraries cannot satisfy the escalating…

Machine Learning · Computer Science 2023-02-22 Zihao Ye , Ruihang Lai , Junru Shao , Tianqi Chen , Luis Ceze

Insum: Sparse GPU Kernels Simplified and Optimized with Indirect Einsums

Programming high-performance sparse GPU kernels is notoriously difficult, requiring both substantial effort and deep expertise. Sparse compilers aim to simplify this process, but existing systems fall short in two key ways. First, they are…

Programming Languages · Computer Science 2025-10-21 Jaeyeon Won , Willow Ahrens , Joel S. Emer , Saman Amarasinghe

Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores

General-purpose Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental kernel in scientific computing and deep learning. The emergence of new matrix computation units such as Tensor Cores (TCs) brings more opportunities for SpMM…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-17 Haisha Zhao , San Li , Jiaheng Wang , Chunbao Zhou , Jue Wang , Zhikuang Xin , Shunde Li , Zhiqiang Liang , Zhijie Pan , Fang Liu , Yan Zeng , Yangang Wang , Xuebin Chi

Efficient Sparse Matrix Kernels based on Adaptive Workload-Balancing and Parallel-Reduction

Sparse matrix-vector and matrix-matrix multiplication (SpMV and SpMM) are fundamental in both conventional (graph analytics, scientific computing) and emerging (sparse DNN, GNN) domains. Workload-balancing and parallel-reduction are…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-15 Guyue Huang , Guohao Dai , Yu Wang , Yufei Ding , Yuan Xie

Auto-SpMV: Automated Optimizing SpMV Kernels on GPU

Sparse matrix-vector multiplication (SpMV) is an essential linear algebra operation that dominates the computing cost in many scientific applications. Due to providing massive parallelism and high memory bandwidth, GPUs are commonly used to…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-14 Mina Ashoury , Mohammad Loni , Farshad Khunjush , Masoud Daneshtalab

An Efficient Sparse Kernel Generator for O(3)-Equivariant Deep Networks

Rotation equivariant graph neural networks, i.e. networks designed to guarantee certain geometric relations between their inputs and outputs, yield state of the art performance on spatial deep learning tasks. They exhibit high data…

Machine Learning · Computer Science 2025-05-12 Vivek Bharadwaj , Austin Glover , Aydin Buluc , James Demmel

Sparse GPU Kernels for Deep Learning

Scientific workloads have traditionally exploited high levels of sparsity to accelerate computation and reduce memory requirements. While deep neural networks can be made sparse, achieving practical speedups on GPUs is difficult because…

Machine Learning · Computer Science 2020-09-02 Trevor Gale , Matei Zaharia , Cliff Young , Erich Elsen

OpSparse: a Highly Optimized Framework for Sparse General Matrix Multiplication on GPUs

Sparse general matrix multiplication (SpGEMM) is an important and expensive computation primitive in many real-world applications. Due to SpGEMM's inherent irregularity and the vast diversity of its input matrices, developing…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-16 Zhaoyang Du , Yijin Guan , Tianchan Guan , Dimin Niu , Linyong Huang , Hongzhong Zheng , Yuan Xie

An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs

In recent years, Transformer-based language models have become the standard approach for natural language processing tasks. However, stringent throughput and latency requirements in industrial applications are limiting their adoption. To…

Machine Learning · Computer Science 2023-06-30 Haihao Shen , Hengyu Meng , Bo Dong , Zhe Wang , Ofir Zafrir , Yi Ding , Yu Luo , Hanwen Chang , Qun Gao , Ziheng Wang , Guy Boudoukh , Moshe Wasserblat

iSpLib: A Library for Accelerating Graph Neural Networks using Auto-tuned Sparse Operations

Core computations in Graph Neural Network (GNN) training and inference are often mapped to sparse matrix operations such as sparse-dense matrix multiplication (SpMM). These sparse operations are harder to optimize by manual tuning because…

Machine Learning · Computer Science 2024-03-25 Md Saidul Hoque Anik , Pranav Badhe , Rohit Gampa , Ariful Azad

A Novel Compiler Transformation for Fast Sparse Matrix Multiplication in GPUs

Sparse data structures are commonly used in neural networks to reduce the memory footprint. These data structures are compact but cause irregularities such as random memory accesses, which prevent efficient use of the memory hierarchy. GPUs…

Programming Languages · Computer Science 2025-06-19 Hossein Albakri , Kazem Cheshmi

AsyncSparse: Accelerating Sparse Matrix-Matrix Multiplication on Asynchronous GPU Architectures

Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental kernel across scientific computing and machine learning. While prior work accelerates SpMM using Tensor Cores, no existing sparse kernel exploits the asynchronous features of…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-21 Jie Liu , Huanzhi Pu , Zhiru Zhang

Speeding Up Mixed-Integer Programming Solvers with Sparse Learning for Branching

Machine learning is increasingly used to improve decisions within branch-and-bound algorithms for mixed-integer programming. Many existing approaches rely on deep learning, which often requires very large training datasets and substantial…

Machine Learning · Computer Science 2026-04-02 Selin Bayramoğlu , George L Nemhauser , Nikolaos V Sahinidis

Scorch: A Library for Sparse Deep Learning

The rapid growth in the size of deep learning models strains the capabilities of traditional dense computation paradigms. Leveraging sparse computation has become increasingly popular for training and deploying large-scale models, but…

Machine Learning · Computer Science 2024-06-21 Bobby Yan , Alexander J. Root , Trevor Gale , David Broman , Fredrik Kjolstad

At-Scale Sparse Deep Neural Network Inference with Efficient GPU Implementation

This paper presents GPU performance optimization and scaling results for inference models of the Sparse Deep Neural Network Challenge 2020. Demands for network quality have increased rapidly, pushing the size and thus the memory…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-04 Mert Hidayetoglu , Carl Pearson , Vikram Sharma Mailthody , Eiman Ebrahimi , Jinjun Xiong , Rakesh Nagi , Wen-Mei Hwu

FuseFlow: A Fusion-Centric Compilation Framework for Sparse Deep Learning on Streaming Dataflow

As deep learning models scale, sparse computation and specialized dataflow hardware have emerged as powerful solutions to address efficiency. We propose FuseFlow, a compiler that converts sparse machine learning models written in PyTorch to…

Machine Learning · Computer Science 2026-01-27 Rubens Lacouture , Nathan Zhang , Ritvik Sharma , Marco Siracusa , Fredrik Kjolstad , Kunle Olukotun , Olivia Hsu

SparseTrain: Exploiting Dataflow Sparsity for Efficient Convolutional Neural Networks Training

Training Convolutional Neural Networks (CNNs) usually requires a large number of computational resources. In this paper, \textit{SparseTrain} is proposed to accelerate CNN training by fully exploiting the sparsity. It mainly involves three…

Computer Vision and Pattern Recognition · Computer Science 2020-07-28 Pengcheng Dai , Jianlei Yang , Xucheng Ye , Xingzhou Cheng , Junyu Luo , Linghao Song , Yiran Chen , Weisheng Zhao

Adaptive SpMV/SpMSpV on GPUs for Input Vectors of Varied Sparsity

Despite numerous efforts for optimizing the performance of Sparse Matrix and Vector Multiplication (SpMV) on modern hardware architectures, few works are done to its sparse counterpart, Sparse Matrix and Sparse Vector Multiplication…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-12-18 Min Li , Yulong Ao , Chao Yang