Related papers: At-Scale Sparse Deep Neural Network Inference with…

Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training

Parallel training of neural networks at scale is challenging due to significant overheads arising from communication. Recently, deep learning researchers have developed a variety of pruning algorithms that are capable of pruning (i.e.…

Machine Learning · Computer Science 2023-05-16 Siddharth Singh , Abhinav Bhatele

Sparse GPU Kernels for Deep Learning

Scientific workloads have traditionally exploited high levels of sparsity to accelerate computation and reduce memory requirements. While deep neural networks can be made sparse, achieving practical speedups on GPUs is difficult because…

Machine Learning · Computer Science 2020-09-02 Trevor Gale , Matei Zaharia , Cliff Young , Erich Elsen

Accelerating SpMM Kernel with Cache-First Edge Sampling for Graph Neural Networks

Graph neural networks (GNNs), an emerging deep learning model class, can extract meaningful representations from highly expressive graph-structured data and are therefore gaining popularity for wider ranges of applications. However, current…

Machine Learning · Computer Science 2021-04-27 Chien-Yu Lin , Liang Luo , Luis Ceze

Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip

Recurrent Neural Networks (RNNs) are powerful tools for solving sequence-based problems, but their efficacy and execution time are dependent on the size of the network. Following recent work in simplifying these networks with model pruning…

Neural and Evolutionary Computing · Computer Science 2018-04-30 Feiwen Zhu , Jeff Pool , Michael Andersch , Jeremy Appleyard , Fung Xie

Sparse Computations in Deep Learning Inference

The computational demands of modern Deep Neural Networks (DNNs) are immense and constantly growing. While training costs usually capture public attention, inference demands are also contributing in significant computational, energy and…

Computational Engineering, Finance, and Science · Computer Science 2025-12-03 Ioanna Tasou , Panagiotis Mpakos , Angelos Vlachos , Dionysios Adamopoulos , Georgios Giannakopoulos , Konstantinos Katsikopoulos , Ioannis Karaparisis , Maria Lazou , Spyridon Loukovitis , Areti Mei , Anastasia Poulopoulou , Angeliki Dimitriou , Giorgos Filandrianos , Dimitrios Galanopoulos , Vasileios Karampinis , Ilias Mitsouras , Nikolaos Spanos , Petros Anastasiadis , Ioannis Doudalis , Konstantinos Nikas , George Retsinas , Paraskevi Tzouveli , Christina Giannoula , Nectarios Koziris , Nikela Papadopoulou , Giorgos Stamou , Athanasios Voulodimos , Georgios Goumas

Balanced Sparsity for Efficient DNN Inference on GPU

In trained deep neural networks, unstructured pruning can reduce redundant weights to lower storage cost. However, it requires the customization of hardwares to speed up practical inference. Another trend accelerates sparse model inference…

Computer Vision and Pattern Recognition · Computer Science 2020-10-30 Zhuliang Yao , Shijie Cao , Wencong Xiao , Chen Zhang , Lanshun Nie

SparseNN: An Energy-Efficient Neural Network Accelerator Exploiting Input and Output Sparsity

Contemporary Deep Neural Network (DNN) contains millions of synaptic connections with tens to hundreds of layers. The large computation and memory requirements pose a challenge to the hardware design. In this work, we leverage the intrinsic…

Machine Learning · Computer Science 2017-11-07 Jingyang Zhu , Jingbo Jiang , Xizi Chen , Chi-Ying Tsui

Efficient Sparse Matrix Kernels based on Adaptive Workload-Balancing and Parallel-Reduction

Sparse matrix-vector and matrix-matrix multiplication (SpMV and SpMM) are fundamental in both conventional (graph analytics, scientific computing) and emerging (sparse DNN, GNN) domains. Workload-balancing and parallel-reduction are…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-15 Guyue Huang , Guohao Dai , Yu Wang , Yufei Ding , Yuan Xie

Batched Sparse Matrix Multiplication for Accelerating Graph Convolutional Networks

Graph Convolutional Networks (GCNs) are recently getting much attention in bioinformatics and chemoinformatics as a state-of-the-art machine learning approach with high accuracy. GCNs process convolutional operations along with graph…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-03-28 Yusuke Nagasaka , Akira Nukada , Ryosuke Kojima , Satoshi Matsuoka

Partitioning sparse deep neural networks for scalable training and inference

The state-of-the-art deep neural networks (DNNs) have significant computational and data management requirements. The size of both training data and models continue to increase. Sparsification and pruning methods are shown to be effective…

Machine Learning · Computer Science 2021-04-27 Gunduz Vehbi Demirci , Hakan Ferhatosmanoglu

Design Principles for Sparse Matrix Multiplication on the GPU

We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion.…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-13 Carl Yang , Aydin Buluc , John D. Owens

Sparse Deep Neural Network Graph Challenge

The MIT/IEEE/Amazon GraphChallenge.org encourages community approaches to developing new solutions for analyzing graphs and sparse data. Sparse AI analytics present unique scalability difficulties. The proposed Sparse Deep Neural Network…

Computer Vision and Pattern Recognition · Computer Science 2019-12-03 Jeremy Kepner , Simon Alford , Vijay Gadepally , Michael Jones , Lauren Milechin , Ryan Robinett , Sid Samsi

SparseDNN: Fast Sparse Deep Learning Inference on CPUs

The last few years have seen gigantic leaps in algorithms and systems to support efficient deep learning inference. Pruning and quantization algorithms can now consistently compress neural networks by an order of magnitude. For a compressed…

Machine Learning · Computer Science 2021-07-22 Ziheng Wang

GE-SpMM: General-purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks

Graph Neural Networks (GNNs) have achieved significant improvements in various domains. Sparse Matrix-Matrix multiplication (SpMM) is a fundamental operator in GNNs, which performs a multiplication between a sparse matrix and a dense…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-08 Guyue Huang , Guohao Dai , Yu Wang , Huazhong Yang

Accelerating Training of Deep Neural Networks via Sparse Edge Processing

We propose a reconfigurable hardware architecture for deep neural networks (DNNs) capable of online training and inference, which uses algorithmically pre-determined, structured sparsity to significantly lower memory and computational…

Neural and Evolutionary Computing · Computer Science 2017-11-07 Sourya Dey , Yinan Shao , Keith M. Chugg , Peter A. Beerel

GraphChallenge.org Sparse Deep Neural Network Performance

The MIT/IEEE/Amazon GraphChallenge.org encourages community approaches to developing new solutions for analyzing graphs and sparse data. Sparse AI analytics present unique scalability difficulties. The Sparse Deep Neural Network (DNN)…

Machine Learning · Computer Science 2020-12-24 Jeremy Kepner , Simon Alford , Vijay Gadepally , Michael Jones , Lauren Milechin , Albert Reuther , Ryan Robinett , Sid Samsi

Hierarchical Block Sparse Neural Networks

Sparse deep neural networks(DNNs) are efficient in both memory and compute when compared to dense DNNs. But due to irregularity in computation of sparse DNNs, their efficiencies are much lower than that of dense DNNs on regular parallel…

Machine Learning · Computer Science 2018-12-31 Dharma Teja Vooturi , Dheevatsa Mudigere , Sasikanth Avancha

Pushing the Performance Envelope of DNN-based Recommendation Systems Inference on GPUs

Personalized recommendation is a ubiquitous application on the internet, with many industries and hyperscalers extensively leveraging Deep Learning Recommendation Models (DLRMs) for their personalization needs (like ad serving or movie…

Hardware Architecture · Computer Science 2024-10-30 Rishabh Jain , Vivek M. Bhasi , Adwait Jog , Anand Sivasubramaniam , Mahmut T. Kandemir , Chita R. Das

ParamSpMM: Adaptive and Efficient Sparse Matrix-Matrix Multiplication on GPUs for GNNs

Fueled by the ability to mine real-world graph data, GNN applications have experienced phenomenal growth. Sparse Matrix-Matrix Multiplication (SpMM) is a critical operator in GNNs. However, existing SpMM designs for GNNs struggle to adapt…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-18 Lixing Zhang , Guanhua Ye , Hongzheng Li , Shigang Li , Yingxia Shao

GPU Acceleration of Sparse Neural Networks

In this paper, we use graphics processing units(GPU) to accelerate sparse and arbitrary structured neural networks. Sparse networks have nodes in the network that are not fully connected with nodes in preceding and following layers, and…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-12 Aavaas Gajurel , Sushil J. Louis , Frederick C Harris