Related papers: PCNN: Pattern-based Fine-Grained Regular Pruning t…

Tight Compression: Compressing CNN Through Fine-Grained Pruning and Weight Permutation for Efficient Implementation

The unstructured sparsity after pruning poses a challenge to the efficient implementation of deep learning models in existing regular architectures like systolic arrays. On the other hand, coarse-grained structured pruning is suitable for…

Machine Learning · Computer Science 2024-11-22 Xizi Chen , Jingyang Zhu , Jingbo Jiang , Chi-Ying Tsui

Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit-width reduction

This work is focused on the pruning of some convolutional neural networks (CNNs) and improving theirs efficiency on graphic processing units (GPU) by using a direct sparse algorithm. The Nvidia deep neural network (cuDnn) library is the…

Machine Learning · Computer Science 2022-08-30 Marcin Pietroń , Dominik Żurek

1xN Pattern for Pruning Convolutional Neural Networks

Though network pruning receives popularity in reducing the complexity of convolutional neural networks (CNNs), it remains an open issue to concurrently maintain model accuracy as well as achieve significant speedups on general CPUs. In this…

Computer Vision and Pattern Recognition · Computer Science 2022-10-18 Mingbao Lin , Yuxin Zhang , Yuchao Li , Bohong Chen , Fei Chao , Mengdi Wang , Shen Li , Yonghong Tian , Rongrong Ji

FSCNN: A Fast Sparse Convolution Neural Network Inference System

Convolution neural networks (CNNs) have achieved remarkable success, but typically accompany high computation cost and numerous redundant weight parameters. To reduce the FLOPs, structure pruning is a popular approach to remove the entire…

Computer Vision and Pattern Recognition · Computer Science 2022-12-20 Bo Ji , Tianyi Chen

When deep learning models on GPU can be accelerated by taking advantage of unstructured sparsity

This paper is focused on the improvement the efficiency of the sparse convolutional neural networks (CNNs) layers on graphic processing units (GPU). The Nvidia deep neural network (cuDnn) library provides the most effective implementation…

Machine Learning · Computer Science 2022-01-03 Marcin Pietroń , Dominik Żurek

StructADMM: A Systematic, High-Efficiency Framework of Structured Weight Pruning for DNNs

Weight pruning methods of DNNs have been demonstrated to achieve a good model pruning rate without loss of accuracy, thereby alleviating the significant computation/storage requirements of large-scale DNNs. Structured weight pruning methods…

Neural and Evolutionary Computing · Computer Science 2019-03-28 Tianyun Zhang , Shaokai Ye , Kaiqi Zhang , Xiaolong Ma , Ning Liu , Linfeng Zhang , Jian Tang , Kaisheng Ma , Xue Lin , Makan Fardad , Yanzhi Wang

Stability Based Filter Pruning for Accelerating Deep CNNs

Convolutional neural networks (CNN) have achieved impressive performance on the wide variety of tasks (classification, detection, etc.) across multiple domains at the cost of high computational and memory requirements. Thus, leveraging CNNs…

Computer Vision and Pattern Recognition · Computer Science 2018-11-21 Pravendra Singh , Vinay Sameer Raja Kadi , Nikhil Verma , Vinay P. Namboodiri

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices. However, prior pruning schemes have limited application scenarios due to…

Machine Learning · Computer Science 2022-03-29 Yifan Gong , Geng Yuan , Zheng Zhan , Wei Niu , Zhengang Li , Pu Zhao , Yuxuan Cai , Sijia Liu , Bin Ren , Xue Lin , Xulong Tang , Yanzhi Wang

PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices

Model compression techniques on Deep Neural Network (DNN) have been widely acknowledged as an effective way to achieve acceleration on a variety of platforms, and DNN weight pruning is a straightforward and effective method. There are…

Machine Learning · Computer Science 2020-03-06 Xiaolong Ma , Fu-Ming Guo , Wei Niu , Xue Lin , Jian Tang , Kaisheng Ma , Bin Ren , Yanzhi Wang

An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices

Weight pruning has been widely acknowledged as a straightforward and effective method to eliminate redundancy in Deep Neural Networks (DNN), thereby achieving acceleration on various platforms. However, most of the pruning techniques are…

Computer Vision and Pattern Recognition · Computer Science 2020-07-07 Xiaolong Ma , Wei Niu , Tianyun Zhang , Sijia Liu , Sheng Lin , Hongjia Li , Xiang Chen , Jian Tang , Kaisheng Ma , Bin Ren , Yanzhi Wang

SPDY: Accurate Pruning with Speedup Guarantees

The recent focus on the efficiency of deep neural networks (DNNs) has led to significant work on model compression approaches, of which weight pruning is one of the most popular. At the same time, there is rapidly-growing computational…

Machine Learning · Computer Science 2022-08-25 Elias Frantar , Dan Alistarh

Structured Pruning for Efficient ConvNets via Incremental Regularization

Parameter pruning is a promising approach for CNN compression and acceleration by eliminating redundant model parameters with tolerable performance loss. Despite its effectiveness, existing regularization-based parameter pruning methods…

Computer Vision and Pattern Recognition · Computer Science 2018-12-20 Huan Wang , Qiming Zhang , Yuehai Wang , Haoji Hu

Structured Pruning is All You Need for Pruning CNNs at Initialization

Pruning is a popular technique for reducing the model size and computational cost of convolutional neural networks (CNNs). However, a slow retraining or fine-tuning procedure is often required to recover the accuracy loss caused by pruning.…

Computer Vision and Pattern Recognition · Computer Science 2022-06-01 Yaohui Cai , Weizhe Hua , Hongzheng Chen , G. Edward Suh , Christopher De Sa , Zhiru Zhang

An Entropy-based Pruning Method for CNN Compression

This paper aims to simultaneously accelerate and compress off-the-shelf CNN models via filter pruning strategy. The importance of each filter is evaluated by the proposed entropy-based method first. Then several unimportant filters are…

Computer Vision and Pattern Recognition · Computer Science 2017-06-20 Jian-Hao Luo , Jianxin Wu

CSB-RNN: A Faster-than-Realtime RNN Acceleration Framework with Compressed Structured Blocks

Recurrent neural networks (RNNs) have been widely adopted in temporal sequence analysis, where realtime performance is often in demand. However, RNNs suffer from heavy computational workload as the model often comes with large weight…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-13 Runbin Shi , Peiyan Dong , Tong Geng , Yuhao Ding , Xiaolong Ma , Hayden K. -H. So , Martin Herbordt , Ang Li , Yanzhi Wang

CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-CirculantWeight Matrices

Large-scale deep neural networks (DNNs) are both compute and memory intensive. As the size of DNNs continues to grow, it is critical to improve the energy efficiency and performance while maintaining accuracy. For DNNs, the model size is an…

Computer Vision and Pattern Recognition · Computer Science 2017-09-11 Caiwen Ding , Siyu Liao , Yanzhi Wang , Zhe Li , Ning Liu , Youwei Zhuo , Chao Wang , Xuehai Qian , Yu Bai , Geng Yuan , Xiaolong Ma , Yipeng Zhang , Jian Tang , Qinru Qiu , Xue Lin , Bo Yuan

Faster CNNs with Direct Sparse Convolutions and Guided Pruning

Phenomenally successful in practical inference problems, convolutional neural networks (CNN) are widely deployed in mobile devices, data centers, and even supercomputers. The number of parameters needed in CNNs, however, are often large and…

Computer Vision and Pattern Recognition · Computer Science 2017-08-01 Jongsoo Park , Sheng Li , Wei Wen , Ping Tak Peter Tang , Hai Li , Yiran Chen , Pradeep Dubey

Efficient Micro-Structured Weight Unification and Pruning for Neural Network Compression

Compressing Deep Neural Network (DNN) models to alleviate the storage and computation requirements is essential for practical applications, especially for resource limited devices. Although capable of reducing a reasonable amount of model…

Machine Learning · Computer Science 2021-06-17 Sheng Lin , Wei Jiang , Wei Wang , Kaidi Xu , Yanzhi Wang , Shan Liu , Songnan Li

Model Compression using Progressive Channel Pruning

In this work, we propose a simple but effective channel pruning framework called Progressive Channel Pruning (PCP) to accelerate Convolutional Neural Networks (CNNs). In contrast to the existing channel pruning methods that prune channels…

Computer Vision and Pattern Recognition · Computer Science 2025-07-08 Jinyang Guo , Weichen Zhang , Wanli Ouyang , Dong Xu

PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning

With the emergence of a spectrum of high-end mobile devices, many applications that formerly required desktop-level computation capability are being transferred to these devices. However, executing the inference of Deep Neural Networks…

Machine Learning · Computer Science 2020-01-23 Wei Niu , Xiaolong Ma , Sheng Lin , Shihao Wang , Xuehai Qian , Xue Lin , Yanzhi Wang , Bin Ren