Related papers: Accelerating Sparse Deep Neural Networks

Sparse GPU Kernels for Deep Learning

Scientific workloads have traditionally exploited high levels of sparsity to accelerate computation and reduce memory requirements. While deep neural networks can be made sparse, achieving practical speedups on GPUs is difficult because…

Machine Learning · Computer Science 2020-09-02 Trevor Gale , Matei Zaharia , Cliff Young , Erich Elsen

Accelerating Sparse DNNs Based on Tiled GEMM

Network pruning can reduce the computation cost of deep neural network (DNN) models. However, sparse models often produce randomly-distributed weights to maintain accuracy, leading to irregular computations. Consequently, unstructured…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-19 Cong Guo , Fengchen Xue , Jingwen Leng , Yuxian Qiu , Yue Guan , Weihao Cui , Quan Chen , Minyi Guo

Computation on Sparse Neural Networks: an Inspiration for Future Hardware

Neural network models are widely used in solving many challenging problems, such as computer vision, personalized recommendation, and natural language processing. Those models are very computationally intensive and reach the hardware limit…

Machine Learning · Computer Science 2020-04-28 Fei Sun , Minghai Qin , Tianyun Zhang , Liu Liu , Yen-Kuang Chen , Yuan Xie

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity

Network pruning can reduce the high computation cost of deep neural network (DNN) models. However, to maintain their accuracies, sparse models often carry randomly-distributed weights, leading to irregular computations. Consequently, sparse…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-01 Cong Guo , Bo Yang Hsueh , Jingwen Leng , Yuxian Qiu , Yue Guan , Zehuan Wang , Xiaoying Jia , Xipeng Li , Minyi Guo , Yuhao Zhu

Dual-side Sparse Tensor Core

Leveraging sparsity in deep neural network (DNN) models is promising for accelerating model inference. Yet existing GPUs can only leverage the sparsity from weights but not activations, which are dynamic, unpredictable, and hence…

Hardware Architecture · Computer Science 2021-05-21 Yang Wang , Chen Zhang , Zhiqiang Xie , Cong Guo , Yunxin Liu , Jingwen Leng

Blocking Techniques for Sparse Matrix Multiplication on Tensor Accelerators

Tensor accelerators have gained popularity because they provide a cheap and efficient solution for speeding up computational-expensive tasks in Deep Learning and, more recently, in other Scientific Computing applications. However, since…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-15 Paolo Sylos Labini , Massimo Bernaschi , Francesco Silvestri , Flavio Vella

GPU Acceleration of Sparse Neural Networks

In this paper, we use graphics processing units(GPU) to accelerate sparse and arbitrary structured neural networks. Sparse networks have nodes in the network that are not fully connected with nodes in preceding and following layers, and…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-12 Aavaas Gajurel , Sushil J. Louis , Frederick C Harris

Sparse-on-Dense: Area and Energy-Efficient Computing of Sparse Neural Networks on Dense Matrix Multiplication Accelerators

As the size of Deep Neural Networks (DNNs) increases dramatically to achieve high accuracy, the DNNs require a large amount of computations and memory footprint. Pruning, which produces a sparse neural network, is one of the solutions to…

Hardware Architecture · Computer Science 2026-04-30 Hyunsung Yoon , Sungju Ryu , Jae-Joon Kim

Benchmarking GPU and TPU Performance with Graph Neural Networks

Many artificial intelligence (AI) devices have been developed to accelerate the training and inference of neural networks models. The most common ones are the Graphics Processing Unit (GPU) and Tensor Processing Unit (TPU). They are highly…

Machine Learning · Computer Science 2022-10-25 xiangyang Ju , Yunsong Wang , Daniel Murnane , Nicholas Choma , Steven Farrell , Paolo Calafiura

Sparse Computations in Deep Learning Inference

The computational demands of modern Deep Neural Networks (DNNs) are immense and constantly growing. While training costs usually capture public attention, inference demands are also contributing in significant computational, energy and…

Computational Engineering, Finance, and Science · Computer Science 2025-12-03 Ioanna Tasou , Panagiotis Mpakos , Angelos Vlachos , Dionysios Adamopoulos , Georgios Giannakopoulos , Konstantinos Katsikopoulos , Ioannis Karaparisis , Maria Lazou , Spyridon Loukovitis , Areti Mei , Anastasia Poulopoulou , Angeliki Dimitriou , Giorgos Filandrianos , Dimitrios Galanopoulos , Vasileios Karampinis , Ilias Mitsouras , Nikolaos Spanos , Petros Anastasiadis , Ioannis Doudalis , Konstantinos Nikas , George Retsinas , Paraskevi Tzouveli , Christina Giannoula , Nectarios Koziris , Nikela Papadopoulou , Giorgos Stamou , Athanasios Voulodimos , Georgios Goumas

Sparse Networks from Scratch: Faster Training without Losing Performance

We demonstrate the possibility of what we call sparse learning: accelerated training of deep neural networks that maintain sparse weights throughout training while achieving dense performance levels. We accomplish this by developing sparse…

Machine Learning · Computer Science 2019-08-27 Tim Dettmers , Luke Zettlemoyer

Efficient Quantized Sparse Matrix Operations on Tensor Cores

The exponentially growing model size drives the continued success of deep learning, but it brings prohibitive computation and memory cost. From the algorithm perspective, model sparsification and quantization have been studied to alleviate…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-09 Shigang Li , Kazuki Osawa , Torsten Hoefler

S4: a High-sparsity, High-performance AI Accelerator

Exploiting sparsity underlying neural networks has become one of the most potential methodologies to reduce the memory footprint, I/O cost, and computation workloads during inference. And the degree of sparsity one can exploit has become…

Hardware Architecture · Computer Science 2022-07-19 Ian En-Hsu Yen , Zhibin Xiao , Dongkuan Xu

Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators

Exploiting sparsity in deep neural networks (DNNs) has been a promising area for meeting the growing computation requirements. To minimize the overhead of sparse acceleration, hardware designers have proposed structured sparsity support,…

Machine Learning · Computer Science 2025-05-27 Geonhwa Jeong , Po-An Tsai , Abhimanyu R. Bambhaniya , Stephen W. Keckler , Tushar Krishna

Sparse evolutionary Deep Learning with over one million artificial neurons on commodity hardware

Artificial Neural Networks (ANNs) have emerged as hot topics in the research community. Despite the success of ANNs, it is challenging to train and deploy modern ANNs on commodity hardware due to the ever-increasing model size and the…

Neural and Evolutionary Computing · Computer Science 2021-01-19 Shiwei Liu , Decebal Constantin Mocanu , Amarsagar Reddy Ramapuram Matavalam , Yulong Pei , Mykola Pechenizkiy

Fast Training of Sparse Graph Neural Networks on Dense Hardware

Graph neural networks have become increasingly popular in recent years due to their ability to naturally encode relational input data and their ability to scale to large graphs by operating on a sparse representation of graph adjacency…

Machine Learning · Statistics 2019-06-28 Matej Balog , Bart van Merriënboer , Subhodeep Moitra , Yujia Li , Daniel Tarlow

Accelerating Linear Recurrent Neural Networks for the Edge with Unstructured Sparsity

Linear recurrent neural networks enable powerful long-range sequence modeling with constant memory usage and time-per-token during inference. These architectures hold promise for streaming applications at the edge, but deployment in…

Machine Learning · Computer Science 2025-08-14 Alessandro Pierro , Steven Abreu , Jonathan Timcheck , Philipp Stratmann , Andreas Wild , Sumit Bam Shrestha

SparCE: Sparsity aware General Purpose Core Extensions to Accelerate Deep Neural Networks

Deep Neural Networks (DNNs) have emerged as the method of choice for solving a wide range of machine learning tasks. The enormous computational demands posed by DNNs have most commonly been addressed through the design of custom…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-30 Sanchari Sen , Shubham Jain , Swagath Venkataramani , Anand Raghunathan

Sparse Neural Networks Topologies

We propose Sparse Neural Network architectures that are based on random or structured bipartite graph topologies. Sparse architectures provide compression of the models learned and speed-ups of computations, they can also surpass their…

Machine Learning · Computer Science 2017-06-20 Alfred Bourely , John Patrick Boueri , Krzysztof Choromonski

Fused3S: Fast Sparse Attention on Tensor Cores

Sparse attention is a core building block in many leading neural network models, from graph-structured learning to sparse sequence modeling. It can be decomposed into a sequence of three sparse matrix operations (3S): sampled dense-dense…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-14 Zitong Li , Aparna Chandramowlishwaran