Related papers: A MAC-less Neural Inference Processor Supporting C…

ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks

In this paper we introduce ShiftCNN, a generalized low-precision architecture for inference of multiplierless convolutional neural networks (CNNs). ShiftCNN is based on a power-of-two weight representation and, as a result, performs only…

Computer Vision and Pattern Recognition · Computer Science 2017-06-09 Denis A. Gudovskiy , Luca Rigazio

Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are one of the most successful deep machine learning technologies for processing image, voice and video data. CNNs require large amounts of processing capacity and memory, which can exceed the resources…

Neural and Evolutionary Computing · Computer Science 2017-08-17 James Garland , David Gregg

Dedicated Inference Engine and Binary-Weight Neural Networks for Lightweight Instance Segmentation

Reducing computational costs is an important issue for development of embedded systems. Binary-weight Neural Networks (BNNs), in which weights are binarized and activations are quantized, are employed to reduce computational costs of…

Computer Vision and Pattern Recognition · Computer Science 2025-01-06 Tse-Wei Chen , Wei Tao , Dongyue Zhao , Kazuhiro Mima , Tadayuki Ito , Kinya Osa , Masami Kato

Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing

Convolutional neural networks (CNNs) are one of the most successful machine learning techniques for image, voice and video processing. CNNs require large amounts of processing capacity and memory bandwidth. Hardware accelerators have been…

Hardware Architecture · Computer Science 2018-05-03 James Garland , David Gregg

MARS: Multi-macro Architecture SRAM CIM-Based Accelerator with Co-designed Compressed Neural Networks

Convolutional neural networks (CNNs) play a key role in deep learning applications. However, the large storage overheads and the substantial computation cost of CNNs are problematic in hardware accelerators. Computing-in-memory (CIM)…

Hardware Architecture · Computer Science 2021-05-26 Syuan-Hao Sie , Jye-Luen Lee , Yi-Ren Chen , Chih-Cheng Lu , Chih-Cheng Hsieh , Meng-Fan Chang , Kea-Tiong Tang

Exploration of Low Numeric Precision Deep Learning Inference Using Intel FPGAs

CNNs have been shown to maintain reasonable classification accuracy when quantized to lower precisions. Quantizing to sub 8-bit activations and weights can result in accuracy falling below an acceptable threshold. Techniques exist for…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-02 Philip Colangelo , Nasibeh Nasiri , Asit Mishra , Eriko Nurvitadhi , Martin Margala , Kevin Nealis

Coding for Computation: Efficient Compression of Neural Networks for Reconfigurable Hardware

As state of the art neural networks (NNs) continue to grow in size, their resource-efficient implementation becomes ever more important. In this paper, we introduce a compression scheme that reduces the number of computations required for…

Machine Learning · Computer Science 2025-04-25 Hans Rosenberger , Rodrigo Fischer , Johanna S. Fröhlich , Ali Bereyhi , Ralf R. Müller

Lossless Compression via Chained Lightweight Neural Predictors with Information Inheritance

This paper is dedicated to lossless data compression with probability estimation using neural networks. First, we propose a probability estimation architecture based on a chain of neural predictors, so that each unit of the chain is defined…

Information Theory · Computer Science 2026-04-20 Yuriy Kim , Evgeny Belyaev

Exploiting Kernel Compression on BNNs

Binary Neural Networks (BNNs) are showing tremendous success on realistic image classification tasks. Notably, their accuracy is similar to the state-of-the-art accuracy obtained by full-precision models tailored to edge devices. In this…

Hardware Architecture · Computer Science 2022-12-02 Franyell Silfa , Jose Maria Arnau , Antonio González

Towards Fast and Energy-Efficient Binarized Neural Network Inference on FPGA

Binarized Neural Network (BNN) removes bitwidth redundancy in classical CNN by using a single bit (-1/+1) for network parameters and intermediate representations, which has greatly reduced the off-chip data transfer and storage overhead.…

Machine Learning · Computer Science 2018-10-05 Cheng Fu , Shilin Zhu , Hao Su , Ching-En Lee , Jishen Zhao

Layer-wise Weight Selection for Power-Efficient Neural Network Acceleration

Systolic array accelerators execute CNNs with energy dominated by the switching activity of multiply accumulate (MAC) units. Although prior work exploits weight dependent MAC power for compression, existing methods often use global…

Hardware Architecture · Computer Science 2025-12-17 Jiaxun Fang , Grace Li Zhang , Shaoyi Huang

Towards Lossless Binary Convolutional Neural Networks Using Piecewise Approximation

Binary Convolutional Neural Networks (CNNs) can significantly reduce the number of arithmetic operations and the size of memory storage, which makes the deployment of CNNs on mobile or embedded systems more promising. However, the accuracy…

Computer Vision and Pattern Recognition · Computer Science 2020-09-01 Baozhou Zhu , Zaid Al-Ars , Wei Pan

Compact representations of convolutional neural networks via weight pruning and quantization

The state-of-the-art performance for several real-world problems is currently reached by convolutional neural networks (CNN). Such learning models exploit recent results in the field of deep learning, typically leading to highly performing,…

Machine Learning · Computer Science 2021-08-31 Giosuè Cataldo Marinò , Alessandro Petrini , Dario Malchiodi , Marco Frasca

CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-CirculantWeight Matrices

Large-scale deep neural networks (DNNs) are both compute and memory intensive. As the size of DNNs continues to grow, it is critical to improve the energy efficiency and performance while maintaining accuracy. For DNNs, the model size is an…

Computer Vision and Pattern Recognition · Computer Science 2017-09-11 Caiwen Ding , Siyu Liao , Yanzhi Wang , Zhe Li , Ning Liu , Youwei Zhuo , Chao Wang , Xuehai Qian , Yu Bai , Geng Yuan , Xiaolong Ma , Yipeng Zhang , Jian Tang , Qinru Qiu , Xue Lin , Bo Yuan

Towards Accurate Binary Convolutional Neural Network

We introduce a novel scheme to train binary convolutional neural networks (CNNs) -- CNNs with weights and activations constrained to {-1,+1} at run-time. It has been known that using binary weights and activations drastically reduce memory…

Machine Learning · Computer Science 2017-12-01 Xiaofan Lin , Cong Zhao , Wei Pan

Bayes2IMC: In-Memory Computing for Bayesian Binary Neural Networks

Bayesian Neural Networks (BNNs) provide superior estimates of uncertainty by generating an ensemble of predictive distributions. However, inference via ensembling is resource-intensive, requiring additional entropy sources to generate…

Emerging Technologies · Computer Science 2025-05-20 Prabodh Katti , Clement Ruah , Osvaldo Simeone , Bashir M. Al-Hashimi , Bipin Rajendran

Sparse Systolic Tensor Array for Efficient CNN Hardware Acceleration

Convolutional neural network (CNN) inference on mobile devices demands efficient hardware acceleration of low-precision (INT8) general matrix multiplication (GEMM). Exploiting data sparsity is a common approach to further accelerate GEMM…

Hardware Architecture · Computer Science 2020-10-14 Zhi-Gang Liu , Paul N. Whatmough , Matthew Mattina

BRAMAC: Compute-in-BRAM Architectures for Multiply-Accumulate on FPGAs

Deep neural network (DNN) inference using reduced integer precision has been shown to achieve significant improvements in memory utilization and compute throughput with little or no accuracy loss compared to full-precision floating-point.…

Hardware Architecture · Computer Science 2023-04-11 Yuzong Chen , Mohamed S. Abdelfattah

Efficient Micro-Structured Weight Unification and Pruning for Neural Network Compression

Compressing Deep Neural Network (DNN) models to alleviate the storage and computation requirements is essential for practical applications, especially for resource limited devices. Although capable of reducing a reasonable amount of model…

Machine Learning · Computer Science 2021-06-17 Sheng Lin , Wei Jiang , Wei Wang , Kaidi Xu , Yanzhi Wang , Shan Liu , Songnan Li

Logic Design of Neural Networks for High-Throughput and Low-Power Applications

Neural networks (NNs) have been successfully deployed in various fields. In NNs, a large number of multiplyaccumulate (MAC) operations need to be performed. Most existing digital hardware platforms rely on parallel MAC units to accelerate…

Systems and Control · Electrical Eng. & Systems 2023-09-20 Kangwei Xu , Grace Li Zhang , Ulf Schlichtmann , Bing Li