English
Related papers

Related papers: S-STE: Continuous Pruning Function for Efficient 2…

200 papers

Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments. It can be generally categorized into unstructured fine-grained sparsity that zeroes out multiple…

Computer Vision and Pattern Recognition · Computer Science 2021-04-20 Aojun Zhou , Yukun Ma , Junnan Zhu , Jianbo Liu , Zhijie Zhang , Kun Yuan , Wenxiu Sun , Hongsheng Li

Network pruning can reduce the high computation cost of deep neural network (DNN) models. However, to maintain their accuracies, sparse models often carry randomly-distributed weights, leading to irregular computations. Consequently, sparse…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-01 Cong Guo , Bo Yang Hsueh , Jingwen Leng , Yuxian Qiu , Yue Guan , Zehuan Wang , Xiaoying Jia , Xipeng Li , Minyi Guo , Yuhao Zhu

Training large transformers is slow, but recent innovations on GPU architecture give us an advantage. NVIDIA Ampere GPUs can execute a fine-grained 2:4 sparse matrix multiplication twice as fast as its dense equivalent. In the light of this…

Machine Learning · Computer Science 2024-10-29 Yuezhou Hu , Kang Zhao , Weiyu Huang , Jianfei Chen , Jun Zhu

Network pruning can reduce the computation cost of deep neural network (DNN) models. However, sparse models often produce randomly-distributed weights to maintain accuracy, leading to irregular computations. Consequently, unstructured…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-19 Cong Guo , Fengchen Xue , Jingwen Leng , Yuxian Qiu , Yue Guan , Weihao Cui , Quan Chen , Minyi Guo

The Transformer has been an indispensable staple in deep learning. However, for real-life applications, it is very challenging to deploy efficient Transformers due to immense parameters and operations of models. To relieve this burden,…

Hardware Architecture · Computer Science 2022-11-01 Chao Fang , Aojun Zhou , Zhongfeng Wang

We propose a reconfigurable hardware architecture for deep neural networks (DNNs) capable of online training and inference, which uses algorithmically pre-determined, structured sparsity to significantly lower memory and computational…

Neural and Evolutionary Computing · Computer Science 2017-11-07 Sourya Dey , Yinan Shao , Keith M. Chugg , Peter A. Beerel

The success of DNN pruning has led to the development of energy-efficient inference accelerators that support pruned models with sparse weight and activation tensors. Because the memory layouts and dataflows in these architectures are…

Neural and Evolutionary Computing · Computer Science 2020-09-24 Dingqing Yang , Amin Ghasemazar , Xiaowei Ren , Maximilian Golub , Guy Lemieux , Mieszko Lis

The state-of-the-art deep neural networks (DNNs) have significant computational and data management requirements. The size of both training data and models continue to increase. Sparsification and pruning methods are shown to be effective…

Machine Learning · Computer Science 2021-04-27 Gunduz Vehbi Demirci , Hakan Ferhatosmanoglu

As the size of Deep Neural Networks (DNNs) increases dramatically to achieve high accuracy, the DNNs require a large amount of computations and memory footprint. Pruning, which produces a sparse neural network, is one of the solutions to…

Hardware Architecture · Computer Science 2026-04-30 Hyunsung Yoon , Sungju Ryu , Jae-Joon Kim

Sparse training is one of the promising techniques to reduce the computational cost of DNNs while retaining high accuracy. In particular, N:M fine-grained structured sparsity, where only N out of consecutive M elements can be nonzero, has…

Machine Learning · Computer Science 2023-09-25 Chao Fang , Wei Sun , Aojun Zhou , Zhongfeng Wang

Sparse training has received an upsurging interest in machine learning due to its tantalizing saving potential for the entire training process as well as inference. Dynamic sparse training (DST), as a leading sparse training approach, can…

Machine Learning · Computer Science 2023-11-13 Lu Yin , Gen Li , Meng Fang , Li Shen , Tianjin Huang , Zhangyang Wang , Vlado Menkovski , Xiaolong Ma , Mykola Pechenizkiy , Shiwei Liu

Trainings of Large Language Models are generally bottlenecked by matrix multiplications. In the Transformer architecture, a large portion of these operations happens in the Feed Forward Network (FFN), and this portion increases for larger…

Machine Learning · Computer Science 2026-02-09 Meghana Madhyastha , Daniel Haziza , Jesse Cai , Newsha Ardalani , Zhiqi Bu , Carole-Jean Wu

Weight pruning is a technique to make Deep Neural Network (DNN) inference more computationally efficient by reducing the number of model parameters over the course of training. However, most weight pruning techniques generally does not…

Machine Learning · Computer Science 2022-02-03 Bradley McDanel , Helia Dinh , John Magallanes

Sparse neural networks are becoming increasingly important as the field seeks to improve the performance of existing models by scaling them up, while simultaneously trying to reduce power consumption and computational footprint.…

Machine Learning · Computer Science 2021-06-08 Siddhant M. Jayakumar , Razvan Pascanu , Jack W. Rae , Simon Osindero , Erich Elsen

Sparsity has become one of the promising methods to compress and accelerate Deep Neural Networks (DNNs). Among different categories of sparsity, structured sparsity has gained more attention due to its efficient execution on modern…

Machine Learning · Computer Science 2022-09-19 Sheng-Chun Kao , Amir Yazdanbakhsh , Suvinay Subramanian , Shivani Agrawal , Utku Evci , Tushar Krishna

Sparse neural networks have been widely applied to reduce the computational demands of training and deploying over-parameterized deep neural networks. For inference acceleration, methods that discover a sparse network from a pre-trained…

Machine Learning · Computer Science 2021-06-16 Shiwei Liu , Decebal Constantin Mocanu , Yulong Pei , Mykola Pechenizkiy

Sparse training is a natural idea to accelerate the training speed of deep neural networks and save the memory usage, especially since large modern neural networks are significantly over-parameterized. However, most of the existing methods…

Machine Learning · Computer Science 2021-11-11 Xiao Zhou , Weizhong Zhang , Zonghao Chen , Shizhe Diao , Tong Zhang

We demonstrate the possibility of what we call sparse learning: accelerated training of deep neural networks that maintain sparse weights throughout training while achieving dense performance levels. We accomplish this by developing sparse…

Machine Learning · Computer Science 2019-08-27 Tim Dettmers , Luke Zettlemoyer

State-of-the-art convolutional neural networks (CNNs) used in vision applications have large models with numerous weights. Training these models is very compute- and memory-resource intensive. Much research has been done on pruning or…

Machine Learning · Computer Science 2019-12-10 Sangkug Lym , Esha Choukse , Siavash Zangeneh , Wei Wen , Sujay Sanghavi , Mattan Erez

Unstructured pruning reduces the memory footprint in deep neural networks (DNNs). Recently, researchers proposed different types of structural pruning intending to reduce also the computation complexity. In this work, we first suggest a new…

Artificial Intelligence · Computer Science 2021-10-22 Itay Hubara , Brian Chmiel , Moshe Island , Ron Banner , Seffi Naor , Daniel Soudry
‹ Prev 1 2 3 10 Next ›