English
Related papers

Related papers: Accelerating Sparse Deep Neural Networks

200 papers

Scientific workloads have traditionally exploited high levels of sparsity to accelerate computation and reduce memory requirements. While deep neural networks can be made sparse, achieving practical speedups on GPUs is difficult because…

Machine Learning · Computer Science 2020-09-02 Trevor Gale , Matei Zaharia , Cliff Young , Erich Elsen

Network pruning can reduce the computation cost of deep neural network (DNN) models. However, sparse models often produce randomly-distributed weights to maintain accuracy, leading to irregular computations. Consequently, unstructured…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-19 Cong Guo , Fengchen Xue , Jingwen Leng , Yuxian Qiu , Yue Guan , Weihao Cui , Quan Chen , Minyi Guo

Neural network models are widely used in solving many challenging problems, such as computer vision, personalized recommendation, and natural language processing. Those models are very computationally intensive and reach the hardware limit…

Machine Learning · Computer Science 2020-04-28 Fei Sun , Minghai Qin , Tianyun Zhang , Liu Liu , Yen-Kuang Chen , Yuan Xie

Network pruning can reduce the high computation cost of deep neural network (DNN) models. However, to maintain their accuracies, sparse models often carry randomly-distributed weights, leading to irregular computations. Consequently, sparse…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-01 Cong Guo , Bo Yang Hsueh , Jingwen Leng , Yuxian Qiu , Yue Guan , Zehuan Wang , Xiaoying Jia , Xipeng Li , Minyi Guo , Yuhao Zhu

Leveraging sparsity in deep neural network (DNN) models is promising for accelerating model inference. Yet existing GPUs can only leverage the sparsity from weights but not activations, which are dynamic, unpredictable, and hence…

Hardware Architecture · Computer Science 2021-05-21 Yang Wang , Chen Zhang , Zhiqiang Xie , Cong Guo , Yunxin Liu , Jingwen Leng

Tensor accelerators have gained popularity because they provide a cheap and efficient solution for speeding up computational-expensive tasks in Deep Learning and, more recently, in other Scientific Computing applications. However, since…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-15 Paolo Sylos Labini , Massimo Bernaschi , Francesco Silvestri , Flavio Vella

In this paper, we use graphics processing units(GPU) to accelerate sparse and arbitrary structured neural networks. Sparse networks have nodes in the network that are not fully connected with nodes in preceding and following layers, and…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-12 Aavaas Gajurel , Sushil J. Louis , Frederick C Harris

As the size of Deep Neural Networks (DNNs) increases dramatically to achieve high accuracy, the DNNs require a large amount of computations and memory footprint. Pruning, which produces a sparse neural network, is one of the solutions to…

Hardware Architecture · Computer Science 2026-04-30 Hyunsung Yoon , Sungju Ryu , Jae-Joon Kim

Many artificial intelligence (AI) devices have been developed to accelerate the training and inference of neural networks models. The most common ones are the Graphics Processing Unit (GPU) and Tensor Processing Unit (TPU). They are highly…

Machine Learning · Computer Science 2022-10-25 xiangyang Ju , Yunsong Wang , Daniel Murnane , Nicholas Choma , Steven Farrell , Paolo Calafiura

The computational demands of modern Deep Neural Networks (DNNs) are immense and constantly growing. While training costs usually capture public attention, inference demands are also contributing in significant computational, energy and…

We demonstrate the possibility of what we call sparse learning: accelerated training of deep neural networks that maintain sparse weights throughout training while achieving dense performance levels. We accomplish this by developing sparse…

Machine Learning · Computer Science 2019-08-27 Tim Dettmers , Luke Zettlemoyer

The exponentially growing model size drives the continued success of deep learning, but it brings prohibitive computation and memory cost. From the algorithm perspective, model sparsification and quantization have been studied to alleviate…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-09 Shigang Li , Kazuki Osawa , Torsten Hoefler

Exploiting sparsity underlying neural networks has become one of the most potential methodologies to reduce the memory footprint, I/O cost, and computation workloads during inference. And the degree of sparsity one can exploit has become…

Hardware Architecture · Computer Science 2022-07-19 Ian En-Hsu Yen , Zhibin Xiao , Dongkuan Xu

Exploiting sparsity in deep neural networks (DNNs) has been a promising area for meeting the growing computation requirements. To minimize the overhead of sparse acceleration, hardware designers have proposed structured sparsity support,…

Machine Learning · Computer Science 2025-05-27 Geonhwa Jeong , Po-An Tsai , Abhimanyu R. Bambhaniya , Stephen W. Keckler , Tushar Krishna

Artificial Neural Networks (ANNs) have emerged as hot topics in the research community. Despite the success of ANNs, it is challenging to train and deploy modern ANNs on commodity hardware due to the ever-increasing model size and the…

Neural and Evolutionary Computing · Computer Science 2021-01-19 Shiwei Liu , Decebal Constantin Mocanu , Amarsagar Reddy Ramapuram Matavalam , Yulong Pei , Mykola Pechenizkiy

Graph neural networks have become increasingly popular in recent years due to their ability to naturally encode relational input data and their ability to scale to large graphs by operating on a sparse representation of graph adjacency…

Machine Learning · Statistics 2019-06-28 Matej Balog , Bart van Merriënboer , Subhodeep Moitra , Yujia Li , Daniel Tarlow

Linear recurrent neural networks enable powerful long-range sequence modeling with constant memory usage and time-per-token during inference. These architectures hold promise for streaming applications at the edge, but deployment in…

Machine Learning · Computer Science 2025-08-14 Alessandro Pierro , Steven Abreu , Jonathan Timcheck , Philipp Stratmann , Andreas Wild , Sumit Bam Shrestha

Deep Neural Networks (DNNs) have emerged as the method of choice for solving a wide range of machine learning tasks. The enormous computational demands posed by DNNs have most commonly been addressed through the design of custom…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-30 Sanchari Sen , Shubham Jain , Swagath Venkataramani , Anand Raghunathan

We propose Sparse Neural Network architectures that are based on random or structured bipartite graph topologies. Sparse architectures provide compression of the models learned and speed-ups of computations, they can also surpass their…

Machine Learning · Computer Science 2017-06-20 Alfred Bourely , John Patrick Boueri , Krzysztof Choromonski

Sparse attention is a core building block in many leading neural network models, from graph-structured learning to sparse sequence modeling. It can be decomposed into a sequence of three sparse matrix operations (3S): sampled dense-dense…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-14 Zitong Li , Aparna Chandramowlishwaran
‹ Prev 1 2 3 10 Next ›