Related papers: Dual-side Sparse Tensor Core

Accelerating Sparse Deep Neural Networks

As neural network model sizes have dramatically increased, so has the interest in various techniques to reduce their parameter counts and accelerate their execution. An active area of research in this field is sparsity - encouraging zero…

Machine Learning · Computer Science 2021-04-20 Asit Mishra , Jorge Albericio Latorre , Jeff Pool , Darko Stosic , Dusan Stosic , Ganesh Venkatesh , Chong Yu , Paulius Micikevicius

SparseNN: An Energy-Efficient Neural Network Accelerator Exploiting Input and Output Sparsity

Contemporary Deep Neural Network (DNN) contains millions of synaptic connections with tens to hundreds of layers. The large computation and memory requirements pose a challenge to the hardware design. In this work, we leverage the intrinsic…

Machine Learning · Computer Science 2017-11-07 Jingyang Zhu , Jingbo Jiang , Xizi Chen , Chi-Ying Tsui

Griffin: Rethinking Sparse Optimization for Deep Learning Architectures

This paper examines the design space trade-offs of DNNs accelerators aiming to achieve competitive performance and efficiency metrics for all four combinations of dense or sparse activation/weight tensors. To do so, we systematically…

Hardware Architecture · Computer Science 2021-11-03 Jong Hoon Shin , Ali Shafiee , Ardavan Pedram , Hamzah Abdel-Aziz , Ling Li , Joseph Hassoun

Weight Block Sparsity: Training, Compilation, and AI Engine Accelerators

Nowadays, increasingly larger Deep Neural Networks (DNNs) are being developed, trained, and utilized. These networks require significant computational resources, putting a strain on both advanced and limited devices. Our solution is to…

Machine Learning · Computer Science 2024-07-16 Paolo D'Alberto , Taehee Jeong , Akshai Jain , Shreyas Manjunath , Mrinal Sarmah , Samuel Hsu , Yaswanth Raparti , Nitesh Pipralia

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity

Network pruning can reduce the high computation cost of deep neural network (DNN) models. However, to maintain their accuracies, sparse models often carry randomly-distributed weights, leading to irregular computations. Consequently, sparse…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-01 Cong Guo , Bo Yang Hsueh , Jingwen Leng , Yuxian Qiu , Yue Guan , Zehuan Wang , Xiaoying Jia , Xipeng Li , Minyi Guo , Yuhao Zhu

Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators

Exploiting sparsity in deep neural networks (DNNs) has been a promising area for meeting the growing computation requirements. To minimize the overhead of sparse acceleration, hardware designers have proposed structured sparsity support,…

Machine Learning · Computer Science 2025-05-27 Geonhwa Jeong , Po-An Tsai , Abhimanyu R. Bambhaniya , Stephen W. Keckler , Tushar Krishna

Shfl-BW: Accelerating Deep Neural Network Inference with Tensor-Core Aware Weight Pruning

Weight pruning in deep neural networks (DNNs) can reduce storage and computation cost, but struggles to bring practical speedup to the model inference time. Tensor-cores can significantly boost the throughput of GPUs on dense computation,…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-15 Guyue Huang , Haoran Li , Minghai Qin , Fei Sun , Yufei Ding , Yuan Xie

Accelerating Sparse DNNs Based on Tiled GEMM

Network pruning can reduce the computation cost of deep neural network (DNN) models. However, sparse models often produce randomly-distributed weights to maintain accuracy, leading to irregular computations. Consequently, unstructured…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-19 Cong Guo , Fengchen Xue , Jingwen Leng , Yuxian Qiu , Yue Guan , Weihao Cui , Quan Chen , Minyi Guo

Accelerating Training of Deep Neural Networks via Sparse Edge Processing

We propose a reconfigurable hardware architecture for deep neural networks (DNNs) capable of online training and inference, which uses algorithmically pre-determined, structured sparsity to significantly lower memory and computational…

Neural and Evolutionary Computing · Computer Science 2017-11-07 Sourya Dey , Yinan Shao , Keith M. Chugg , Peter A. Beerel

Two Sparsities Are Better Than One: Unlocking the Performance Benefits of Sparse-Sparse Networks

In principle, sparse neural networks should be significantly more efficient than traditional dense networks. Neurons in the brain exhibit two types of sparsity; they are sparsely interconnected and sparsely active. These two types of…

Machine Learning · Computer Science 2021-12-30 Kevin Lee Hunter , Lawrence Spracklen , Subutai Ahmad

Sparse Computations in Deep Learning Inference

The computational demands of modern Deep Neural Networks (DNNs) are immense and constantly growing. While training costs usually capture public attention, inference demands are also contributing in significant computational, energy and…

Computational Engineering, Finance, and Science · Computer Science 2025-12-03 Ioanna Tasou , Panagiotis Mpakos , Angelos Vlachos , Dionysios Adamopoulos , Georgios Giannakopoulos , Konstantinos Katsikopoulos , Ioannis Karaparisis , Maria Lazou , Spyridon Loukovitis , Areti Mei , Anastasia Poulopoulou , Angeliki Dimitriou , Giorgos Filandrianos , Dimitrios Galanopoulos , Vasileios Karampinis , Ilias Mitsouras , Nikolaos Spanos , Petros Anastasiadis , Ioannis Doudalis , Konstantinos Nikas , George Retsinas , Paraskevi Tzouveli , Christina Giannoula , Nectarios Koziris , Nikela Papadopoulou , Giorgos Stamou , Athanasios Voulodimos , Georgios Goumas

Dynamic Sparse Graph for Efficient Deep Learning

We propose to execute deep neural networks (DNNs) with dynamic and sparse graph (DSG) structure for compressive memory and accelerative execution during both training and inference. The great success of DNNs motivates the pursuing of…

Machine Learning · Computer Science 2019-05-08 Liu Liu , Lei Deng , Xing Hu , Maohua Zhu , Guoqi Li , Yufei Ding , Yuan Xie

Balanced Sparsity for Efficient DNN Inference on GPU

In trained deep neural networks, unstructured pruning can reduce redundant weights to lower storage cost. However, it requires the customization of hardwares to speed up practical inference. Another trend accelerates sparse model inference…

Computer Vision and Pattern Recognition · Computer Science 2020-10-30 Zhuliang Yao , Shijie Cao , Wencong Xiao , Chen Zhang , Lanshun Nie

Lightweight Software Kernels and Hardware Extensions for Efficient Sparse Deep Neural Networks on Microcontrollers

The acceleration of pruned Deep Neural Networks (DNNs) on edge devices such as Microcontrollers (MCUs) is a challenging task, given the tight area- and power-constraints of these devices. In this work, we propose a three-fold contribution…

Machine Learning · Computer Science 2025-03-20 Francesco Daghero , Daniele Jahier Pagliari , Francesco Conti , Luca Benini , Massimo Poncino , Alessio Burrello

Compact Multi-level Sparse Neural Networks with Input Independent Dynamic Rerouting

Deep neural networks (DNNs) have shown to provide superb performance in many real life applications, but their large computation cost and storage requirement have prevented them from being deployed to many edge and internet-of-things (IoT)…

Neural and Evolutionary Computing · Computer Science 2021-12-22 Minghai Qin , Tianyun Zhang , Fei Sun , Yen-Kuang Chen , Makan Fardad , Yanzhi Wang , Yuan Xie

SparCE: Sparsity aware General Purpose Core Extensions to Accelerate Deep Neural Networks

Deep Neural Networks (DNNs) have emerged as the method of choice for solving a wide range of machine learning tasks. The enormous computational demands posed by DNNs have most commonly been addressed through the design of custom…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-30 Sanchari Sen , Shubham Jain , Swagath Venkataramani , Anand Raghunathan

Dynasparse: Accelerating GNN Inference through Dynamic Sparsity Exploitation

Graph Neural Network (GNN) inference is used in many real-world applications. Data sparsity in GNN inference, including sparsity in the input graph and the GNN model, offer opportunities to further speed up inference. Also, many pruning…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-03-24 Bingyi Zhang , Viktor Prasanna

Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity

The demand for efficient processing of deep neural networks (DNNs) on embedded devices is a significant challenge limiting their deployment. Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference…

Computer Vision and Pattern Recognition · Computer Science 2023-09-28 Matteo Grimaldi , Darshan C. Ganji , Ivan Lazarevich , Sudhakar Sah

Deep Sparse Coding Using Optimized Linear Expansion of Thresholds

We address the problem of reconstructing sparse signals from noisy and compressive measurements using a feed-forward deep neural network (DNN) with an architecture motivated by the iterative shrinkage-thresholding algorithm (ISTA). We…

Machine Learning · Computer Science 2017-05-23 Debabrata Mahapatra , Subhadip Mukherjee , Chandra Sekhar Seelamantula

DASS: Differentiable Architecture Search for Sparse neural networks

The deployment of Deep Neural Networks (DNNs) on edge devices is hindered by the substantial gap between performance requirements and available processing power. While recent research has made significant strides in developing pruning…

Computer Vision and Pattern Recognition · Computer Science 2025-06-10 Hamid Mousavi , Mohammad Loni , Mina Alibeigi , Masoud Daneshtalab