Related papers: Memory-Efficient CNN Accelerator Based on Interlay…

Transform-Based Feature Map Compression for CNN Inference

To achieve higher accuracy in machine learning tasks, very deep convolutional neural networks (CNNs) are designed recently. However, the large memory access of deep CNNs will lead to high power consumption. A variety of hardware-friendly…

Image and Video Processing · Electrical Eng. & Systems 2021-06-25 Yubo Shi , Meiqi Wang , Siyi Chen , Jinghe Wei , Zhongfeng Wang

A Streaming Accelerator for Deep Convolutional Neural Networks with Image and Feature Decomposition for Resource-limited System Applications

Deep convolutional neural networks (CNN) are widely used in modern artificial intelligence (AI) and smart vision systems but also limited by computation latency, throughput, and energy efficiency on a resource-limited scenario, such as…

Hardware Architecture · Computer Science 2017-09-18 Yuan Du , Li Du , Yilei Li , Junjie Su , Mau-Chung Frank Chang

AoCStream: All-on-Chip CNN Accelerator With Stream-Based Line-Buffer Architecture

Convolutional neural network (CNN) accelerators are being widely used for their efficiency, but they require a large amount of memory, leading to the use of a slow and power consuming external memory. This paper exploits two schemes to…

Hardware Architecture · Computer Science 2022-12-23 Hyeong-Ju Kang

Hardware-Efficient Template-Based Deep CNNs Accelerator Design

Acceleration of Convolutional Neural Network (CNN) on edge devices has recently achieved a remarkable performance in image classification and object detection applications. This paper proposes an efficient and scalable CNN-based SoC-FPGA…

Hardware Architecture · Computer Science 2022-07-29 Azzam Alhussain , Mingjie Lin

Memory-Efficient Dataflow Inference for Deep CNNs on FPGA

Custom dataflow Convolutional Neural Network (CNN) inference accelerators on FPGA are tailored to a specific CNN topology and store parameters in On-Chip Memory (OCM), resulting in high energy efficiency and low inference latency. However,…

Hardware Architecture · Computer Science 2020-11-17 Lucian Petrica , Tobias Alonso , Mairin Kroes , Nicholas Fraser , Sorin Cotofana , Michaela Blott

A flexible FPGA accelerator for convolutional neural networks

Though CNNs are highly parallel workloads, in the absence of efficient on-chip memory reuse techniques, an accelerator for them quickly becomes memory bound. In this paper, we propose a CNN accelerator design for inference that is able to…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-26 Kingshuk Majumder , Shubham Nema , Uday Bondhugula

DeCoILFNet: Depth Concatenation and Inter-Layer Fusion based ConvNet Accelerator

Convolutional Neural Networks (CNNs) are rapidly gaining popularity in varied fields. Due to their increasingly deep and computationally heavy structures, it is difficult to deploy them on energy constrained mobile applications. Hardware…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-10 Akanksha Baranwal , Ishan Bansal , Roopal Nahar , K. Madhava Krishna

Design of High-Throughput Mixed-Precision CNN Accelerators on FPGA

Convolutional Neural Networks (CNNs) reach high accuracies in various application domains, but require large amounts of computation and incur costly data movements. One method to decrease these costs while trading accuracy is weight and/or…

Hardware Architecture · Computer Science 2022-08-10 Cecilia Latotzke , Tim Ciesielski , Tobias Gemmeke

Data-Rate-Aware High-Speed CNN Inference on FPGAs

Dataflow-based CNN accelerators on FPGAs achieve low latency and high throughput by mapping computations of each layer directly to corresponding hardware units. However, layers such as pooling and strided convolutions reduce the data at…

Hardware Architecture · Computer Science 2026-03-11 Tobias Habermann , Martin Kumm

Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA

Deep convolutional neural networks have achieved remarkable progress in recent years. However, the large volume of intermediate results generated during inference poses a significant challenge to the accelerator design for…

Hardware Architecture · Computer Science 2021-05-20 Gang Li , Zejian Liu , Fanrong Li , Jian Cheng

Attention-based Feature Compression for CNN Inference Offloading in Edge Computing

This paper studies the computational offloading of CNN inference in device-edge co-inference systems. Inspired by the emerging paradigm semantic communication, we propose a novel autoencoder-based CNN architecture (AECNN), for effective…

Computer Vision and Pattern Recognition · Computer Science 2023-02-13 Nan Li , Alexandros Iosifidis , Qi Zhang

TinyCNN: A Tiny Modular CNN Accelerator for Embedded FPGA

In recent years, Convolutional Neural Network (CNN) based methods have achieved great success in a large number of applications and have been among the most powerful and widely used techniques in computer vision. However, CNN-based methods…

Machine Learning · Computer Science 2019-11-18 Ali Jahanshahi

Mixed-TD: Efficient Neural Network Accelerator with Layer-Specific Tensor Decomposition

Neural Network designs are quite diverse, from VGG-style to ResNet-style, and from Convolutional Neural Networks to Transformers. Towards the design of efficient accelerators, many works have adopted a dataflow-based, inter-layer pipelined…

Machine Learning · Computer Science 2023-06-23 Zhewen Yu , Christos-Savvas Bouganis

Continuous-Flow Data-Rate-Aware CNN Inference on FPGA

Among hardware accelerators for deep-learning inference, data flow implementations offer low latency and high throughput capabilities. In these architectures, each neuron is mapped to a dedicated hardware unit, making them well-suited for…

Machine Learning · Computer Science 2026-03-10 Tobias Habermann , Michael Mecik , Zhenyu Wang , César David Vera , Martin Kumm , Mario Garrido

Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators

Energy efficiency and memory footprint of a convolutional neural network (CNN) implemented on a CNN inference accelerator depend on many factors, including a weight quantization strategy (i.e., data types and bit-widths) and mapping (i.e.,…

Hardware Architecture · Computer Science 2025-07-23 Jan Klhufek , Miroslav Safar , Vojtech Mrazek , Zdenek Vasicek , Lukas Sekanina

Feature Map Transform Coding for Energy-Efficient CNN Inference

Convolutional neural networks (CNNs) achieve state-of-the-art accuracy in a variety of tasks in computer vision and beyond. One of the major obstacles hindering the ubiquitous use of CNNs for inference on low-power edge devices is their…

Computer Vision and Pattern Recognition · Computer Science 2021-09-28 Brian Chmiel , Chaim Baskin , Ron Banner , Evgenii Zheltonozhskii , Yevgeny Yermolin , Alex Karbachevsky , Alex M. Bronstein , Avi Mendelson

Accelerating CNN inference on FPGAs: A Survey

Convolutional Neural Networks (CNNs) are currently adopted to solve an ever greater number of problems, ranging from speech recognition to image classification and segmentation. The large amount of processing required by CNNs calls for…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-06 Kamel Abdelouahab , Maxime Pelcat , Jocelyn Serot , François Berry

Hardware-software co-exploration with racetrack memory based in-memory computing for CNN inference in embedded systems

Deep neural networks generate and process large volumes of data, posing challenges for low-resource embedded systems. In-memory computing has been demonstrated as an efficient computing infrastructure and shows promise for embedded AI…

Emerging Technologies · Computer Science 2025-07-03 Benjamin Chen Ming Choong , Tao Luo , Cheng Liu , Bingsheng He , Wei Zhang , Joey Tianyi Zhou

A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things

Convolutional neural network (CNN) offers significant accuracy in image detection. To implement image detection using CNN in the internet of things (IoT) devices, a streaming hardware accelerator is proposed. The proposed accelerator…

Computer Vision and Pattern Recognition · Computer Science 2017-07-12 Li Du , Yuan Du , Yilei Li , Mau-Chung Frank Chang

Hardware Automated Dataflow Deployment of CNNs

Deep Convolutional Neural Networks (CNNs) are the state of the art systems for image classification and scene understating. However, such techniques are computationally intensive and involve highly regular parallel computation. CNNs can…

Other Computer Science · Computer Science 2018-05-29 Kamel Abdelouahab , Maxime Pelcat , Jocelyn Serot , Cedric Bourrasset , Jean-Charles Quinton , François Berry