Related papers: BinArray: A Scalable Hardware Accelerator for Bina…

YodaNN: An Architecture for Ultra-Low Power Binary-Weight CNN Acceleration

Convolutional neural networks (CNNs) have revolutionized the world of computer vision over the last few years, pushing image classification beyond human accuracy. The computational effort of today's CNNs requires power-hungry parallel…

Hardware Architecture · Computer Science 2017-02-27 Renzo Andri , Lukas Cavigelli , Davide Rossi , Luca Benini

Accurate and Compact Convolutional Neural Networks with Trained Binarization

Although convolutional neural networks (CNNs) are now widely used in various computer vision applications, its huge resource demanding on parameter storage and computation makes the deployment on mobile and embedded devices difficult.…

Computer Vision and Pattern Recognition · Computer Science 2019-09-26 Zhe Xu , Ray C. C. Cheung

Build a Compact Binary Neural Network through Bit-level Sensitivity and Data Pruning

Convolutional neural network (CNN) has been widely used for vision-based tasks. Due to the high computational complexity and memory storage requirement, it is hard to directly deploy a full-precision CNN on embedded devices. The…

Computer Vision and Pattern Recognition · Computer Science 2018-02-06 Yixing Li , Fengbo Ren

eCNN: A Block-Based and Highly-Parallel CNN Accelerator for Edge Inference

Convolutional neural networks (CNNs) have recently demonstrated superior quality for computational imaging applications. Therefore, they have great potential to revolutionize the image pipelines on cameras and displays. However, it is…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-15 Chao-Tsung Huang , Yu-Chun Ding , Huan-Ching Wang , Chi-Wen Weng , Kai-Ping Lin , Li-Wei Wang , Li-De Chen

Ternary-Input Binary-Weight CNN Accelerator Design for Miniature Object Classification System with Query-Driven Spatial DVS

Miniature imaging systems are essential for space-constrained applications but are limited by memory and power constraints. While machine learning can reduce data size by extracting key features, its high energy demands often exceed the…

Hardware Architecture · Computer Science 2025-12-02 Yuyang Li , Swasthik Muloor , Jack Laudati , Nickolas Dematteis , Yidam Park , Hana Kim , Nathan Chang , Inhee Lee

A Streaming Accelerator for Deep Convolutional Neural Networks with Image and Feature Decomposition for Resource-limited System Applications

Deep convolutional neural networks (CNN) are widely used in modern artificial intelligence (AI) and smart vision systems but also limited by computation latency, throughput, and energy efficiency on a resource-limited scenario, such as…

Hardware Architecture · Computer Science 2017-09-18 Yuan Du , Li Du , Yilei Li , Junjie Su , Mau-Chung Frank Chang

Binary Complex Neural Network Acceleration on FPGA

Being able to learn from complex data with phase information is imperative for many signal processing applications. Today' s real-valued deep neural networks (DNNs) have shown efficiency in latent information analysis but fall short when…

Machine Learning · Computer Science 2021-08-11 Hongwu Peng , Shanglin Zhou , Scott Weitze , Jiaxin Li , Sahidul Islam , Tong Geng , Ang Li , Wei Zhang , Minghu Song , Mimi Xie , Hang Liu , Caiwen Ding

Hardware-Efficient Template-Based Deep CNNs Accelerator Design

Acceleration of Convolutional Neural Network (CNN) on edge devices has recently achieved a remarkable performance in image classification and object detection applications. This paper proposes an efficient and scalable CNN-based SoC-FPGA…

Hardware Architecture · Computer Science 2022-07-29 Azzam Alhussain , Mingjie Lin

Towards Accurate Binary Convolutional Neural Network

We introduce a novel scheme to train binary convolutional neural networks (CNNs) -- CNNs with weights and activations constrained to {-1,+1} at run-time. It has been known that using binary weights and activations drastically reduce memory…

Machine Learning · Computer Science 2017-12-01 Xiaofan Lin , Cong Zhao , Wei Pan

Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks

In the low-bit quantization field, training Binary Neural Networks (BNNs) is the extreme solution to ease the deployment of deep models on resource-constrained devices, having the lowest storage cost and significantly cheaper bit-wise…

Computer Vision and Pattern Recognition · Computer Science 2021-10-19 Yikai Wang , Yi Yang , Fuchun Sun , Anbang Yao

Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration

State-of-the-art convolutional neural networks are enormously costly in both compute and memory, demanding massively parallel GPUs for execution. Such networks strain the computational capabilities and energy available to embedded and…

Computer Vision and Pattern Recognition · Computer Science 2017-07-18 Jeng-Hau Lin , Tianwei Xing , Ritchie Zhao , Zhiru Zhang , Mani Srivastava , Zhuowen Tu , Rajesh K. Gupta

BEANNA: A Binary-Enabled Architecture for Neural Network Acceleration

Modern hardware design trends have shifted towards specialized hardware acceleration for computationally intensive tasks like machine learning and computer vision. While these complex workloads can be accelerated by commercial GPUs,…

Hardware Architecture · Computer Science 2021-08-06 Caleb Terrill , Fred Chu

ChewBaccaNN: A Flexible 223 TOPS/W BNN Accelerator

Binary Neural Networks enable smart IoT devices, as they significantly reduce the required memory footprint and computational complexity while retaining a high network performance and flexibility. This paper presents ChewBaccaNN, a 0.7…

Signal Processing · Electrical Eng. & Systems 2021-03-01 Renzo Andri , Geethan Karunaratne , Lukas Cavigelli , Luca Benini

Efficient Super Resolution Using Binarized Neural Network

Deep convolutional neural networks (DCNNs) have recently demonstrated high-quality results in single-image super-resolution (SR). DCNNs often suffer from over-parametrization and large amounts of redundancy, which results in inefficient…

Computer Vision and Pattern Recognition · Computer Science 2018-12-18 Yinglan Ma , Hongyu Xiong , Zhe Hu , Lizhuang Ma

Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine

Deep neural networks have achieved impressive results in computer vision and machine learning. Unfortunately, state-of-the-art networks are extremely compute and memory intensive which makes them unsuitable for mW-devices such as IoT…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-03-15 Renzo Andri , Lukas Cavigelli , Davide Rossi , Luca Benini

VSCNN: Convolution Neural Network Accelerator With Vector Sparsity

Hardware accelerator for convolution neural network (CNNs) enables real time applications of artificial intelligence technology. However, most of the accelerators only support dense CNN computations or suffers complex control to support…

Hardware Architecture · Computer Science 2022-05-06 Kuo-Wei Chang , Tian-Sheuan Chang

Memory-Efficient CNN Accelerator Based on Interlayer Feature Map Compression

Existing deep convolutional neural networks (CNNs) generate massive interlayer feature data during network inference. To maintain real-time processing in embedded systems, large on-chip memory is required to buffer the interlayer feature…

Hardware Architecture · Computer Science 2021-10-13 Zhuang Shao , Xiaoliang Chen , Li Du , Lei Chen , Yuan Du , Wei Zhuang , Huadong Wei , Chenjia Xie , Zhongfeng Wang

Towards Lossless Binary Convolutional Neural Networks Using Piecewise Approximation

Binary Convolutional Neural Networks (CNNs) can significantly reduce the number of arithmetic operations and the size of memory storage, which makes the deployment of CNNs on mobile or embedded systems more promising. However, the accuracy…

Computer Vision and Pattern Recognition · Computer Science 2020-09-01 Baozhou Zhu , Zaid Al-Ars , Wei Pan

A flexible FPGA accelerator for convolutional neural networks

Though CNNs are highly parallel workloads, in the absence of efficient on-chip memory reuse techniques, an accelerator for them quickly becomes memory bound. In this paper, we propose a CNN accelerator design for inference that is able to…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-26 Kingshuk Majumder , Shubham Nema , Uday Bondhugula

Binarized Convolutional Neural Networks for Efficient Inference on GPUs

Convolutional neural networks have recently achieved significant breakthroughs in various image classification tasks. However, they are computationally expensive,which can make their feasible mplementation on embedded and low-power devices…

Machine Learning · Computer Science 2018-08-02 Mir Khan , Heikki Huttunen , Jani Boutellier