Related papers: GrateTile: Efficient Sparse Tensor Tiling for CNN …

Reuse Kernels or Activations? A Flexible Dataflow for Low-latency Spectral CNN Acceleration

Spectral-domain CNNs have been shown to be more efficient than traditional spatial CNNs in terms of reducing computation complexity. However they come with a `kernel explosion' problem that, even after compression (pruning), imposes a high…

Hardware Architecture · Computer Science 2023-10-18 Yue Niu , Rajgopal Kannan , Ajitesh Srivastava , Viktor Prasanna

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity

Network pruning can reduce the high computation cost of deep neural network (DNN) models. However, to maintain their accuracies, sparse models often carry randomly-distributed weights, leading to irregular computations. Consequently, sparse…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-01 Cong Guo , Bo Yang Hsueh , Jingwen Leng , Yuxian Qiu , Yue Guan , Zehuan Wang , Xiaoying Jia , Xipeng Li , Minyi Guo , Yuhao Zhu

Accelerating Sparse DNNs Based on Tiled GEMM

Network pruning can reduce the computation cost of deep neural network (DNN) models. However, sparse models often produce randomly-distributed weights to maintain accuracy, leading to irregular computations. Consequently, unstructured…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-19 Cong Guo , Fengchen Xue , Jingwen Leng , Yuxian Qiu , Yue Guan , Weihao Cui , Quan Chen , Minyi Guo

Adaptive Pixel-wise Structured Sparse Network for Efficient CNNs

To accelerate deep CNN models, this paper proposes a novel spatially adaptive framework that can dynamically generate pixel-wise sparsity according to the input image. The sparse scheme is pixel-wise refined, regional adaptive under a…

Computer Vision and Pattern Recognition · Computer Science 2021-03-23 Chen Tang , Wenyu Sun , Zhuqing Yuan , Yongpan Liu

SBNet: Sparse Blocks Network for Fast Inference

Conventional deep convolutional neural networks (CNNs) apply convolution operators uniformly in space across all feature maps for hundreds of layers - this incurs a high computational cost for real-time applications. For many problems such…

Computer Vision and Pattern Recognition · Computer Science 2018-06-08 Mengye Ren , Andrei Pokrovsky , Bin Yang , Raquel Urtasun

Slice-and-Forge: Making Better Use of Caches for Graph Convolutional Network Accelerators

Graph convolutional networks (GCNs) are becoming increasingly popular as they can process a wide variety of data formats that prior deep neural networks cannot easily support. One key challenge in designing hardware accelerators for GCNs is…

Machine Learning · Computer Science 2023-01-25 Mingi Yoo , Jaeyong Song , Hyeyoon Lee , Jounghoo Lee , Namhyung Kim , Youngsok Kim , Jinho Lee

Transform-Based Feature Map Compression for CNN Inference

To achieve higher accuracy in machine learning tasks, very deep convolutional neural networks (CNNs) are designed recently. However, the large memory access of deep CNNs will lead to high power consumption. A variety of hardware-friendly…

Image and Video Processing · Electrical Eng. & Systems 2021-06-25 Yubo Shi , Meiqi Wang , Siyi Chen , Jinghe Wei , Zhongfeng Wang

Sense: Model Hardware Co-design for Accelerating Sparse CNN on Systolic Array

Sparsity is an intrinsic property of convolutional neural network(CNN) and worth exploiting for CNN accelerators, but extra processing comes with hardware overhead, causing many architectures suffering from only minor profit. Meanwhile,…

Hardware Architecture · Computer Science 2022-09-26 Wenhao Sun , Deng Liu , Zhiwei Zou , Wendi Sun , Yi Kang , Song Chen

HPIPE: Heterogeneous Layer-Pipelined and Sparse-Aware CNN Inference for FPGAs

We present both a novel Convolutional Neural Network (CNN) accelerator architecture and a network compiler for FPGAs that outperforms all prior work. Instead of having generic processing elements that together process one layer at a time,…

Hardware Architecture · Computer Science 2020-07-22 Mathew Hall , Vaughn Betz

SparseTrain: Exploiting Dataflow Sparsity for Efficient Convolutional Neural Networks Training

Training Convolutional Neural Networks (CNNs) usually requires a large number of computational resources. In this paper, \textit{SparseTrain} is proposed to accelerate CNN training by fully exploiting the sparsity. It mainly involves three…

Computer Vision and Pattern Recognition · Computer Science 2020-07-28 Pengcheng Dai , Jianlei Yang , Xucheng Ye , Xingzhou Cheng , Junyu Luo , Linghao Song , Yiran Chen , Weisheng Zhao

Exploiting Kernel Sparsity and Entropy for Interpretable CNN Compression

Compressing convolutional neural networks (CNNs) has received ever-increasing research focus. However, most existing CNN compression methods do not interpret their inherent structures to distinguish the implicit redundancy. In this paper,…

Computer Vision and Pattern Recognition · Computer Science 2019-04-02 Yuchao Li , Shaohui Lin , Baochang Zhang , Jianzhuang Liu , David Doermann , Yongjian Wu , Feiyue Huang , Rongrong Ji

An Efficient Hardware Accelerator for Structured Sparse Convolutional Neural Networks on FPGAs

Deep Convolutional Neural Networks (CNNs) have achieved state-of-the-art performance in a wide range of applications. However, deeper CNN models, which are usually computation consuming, are widely required for complex Artificial…

Systems and Control · Electrical Eng. & Systems 2020-01-08 Chaoyang Zhu , Kejie Huang , Shuyuan Yang , Ziqi Zhu , Hejia Zhang , Haibin Shen

Memory-Efficient CNN Accelerator Based on Interlayer Feature Map Compression

Existing deep convolutional neural networks (CNNs) generate massive interlayer feature data during network inference. To maintain real-time processing in embedded systems, large on-chip memory is required to buffer the interlayer feature…

Hardware Architecture · Computer Science 2021-10-13 Zhuang Shao , Xiaoliang Chen , Li Du , Lei Chen , Yuan Du , Wei Zhuang , Huadong Wei , Chenjia Xie , Zhongfeng Wang

SPRING: A Sparsity-Aware Reduced-Precision Monolithic 3D CNN Accelerator Architecture for Training and Inference

CNNs outperform traditional machine learning algorithms across a wide range of applications. However, their computational complexity makes it necessary to design efficient hardware accelerators. Most CNN accelerators focus on exploring…

Hardware Architecture · Computer Science 2020-06-25 Ye Yu , Niraj K. Jha

SparsePixels: Efficient Convolution for Sparse Data on FPGAs

Inference of standard convolutional neural networks (CNNs) on FPGAs often incurs high latency and a long initiation interval due to the deep nested loops required to densely convolve every input pixel regardless of its feature value.…

Hardware Architecture · Computer Science 2025-12-16 Ho Fung Tsoi , Dylan Rankin , Vladimir Loncar , Philip Harris

Sparse Systolic Tensor Array for Efficient CNN Hardware Acceleration

Convolutional neural network (CNN) inference on mobile devices demands efficient hardware acceleration of low-precision (INT8) general matrix multiplication (GEMM). Exploiting data sparsity is a common approach to further accelerate GEMM…

Hardware Architecture · Computer Science 2020-10-14 Zhi-Gang Liu , Paul N. Whatmough , Matthew Mattina

Faster CNNs with Direct Sparse Convolutions and Guided Pruning

Phenomenally successful in practical inference problems, convolutional neural networks (CNN) are widely deployed in mobile devices, data centers, and even supercomputers. The number of parameters needed in CNNs, however, are often large and…

Computer Vision and Pattern Recognition · Computer Science 2017-08-01 Jongsoo Park , Sheng Li , Wei Wen , Ping Tak Peter Tang , Hai Li , Yiran Chen , Pradeep Dubey

SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks

Convolutional Neural Networks (CNNs) have emerged as a fundamental technology for machine learning. High performance and extreme energy efficiency are critical for deployments of CNNs in a wide range of situations, especially mobile…

Neural and Evolutionary Computing · Computer Science 2017-08-16 Angshuman Parashar , Minsoo Rhu , Anurag Mukkara , Antonio Puglielli , Rangharajan Venkatesan , Brucek Khailany , Joel Emer , Stephen W. Keckler , William J. Dally

SparseMap: Loop Mapping for Sparse CNNs on Streaming Coarse-grained Reconfigurable Array

Streaming coarse-grained reconfgurable array (CGRA) is a promising architecture for data/computing-intensive applications because of its fexibility, high throughput and efcient memory system. However,when accelerating sparse CNNs, the…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-17 Xiaobing Ni , Mengke Ge , Jiaheng Ruan , Song Chen , Yi Kang

Partitioning Compute Units in CNN Acceleration for Statistical Memory Traffic Shaping

The design complexity of CNNs has been steadily increasing to improve accuracy. To cope with the massive amount of computation needed for such complex CNNs, the latest solutions utilize blocking of an image over the available dimensions and…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-19 Daejin Jung , Sunjung Lee , Wonjong Rhee , Jung Ho Ahn