Related papers: Evolutionary Bin Packing for Memory-Efficient Data…

Memory-Efficient Dataflow Inference for Deep CNNs on FPGA

Custom dataflow Convolutional Neural Network (CNN) inference accelerators on FPGA are tailored to a specific CNN topology and store parameters in On-Chip Memory (OCM), resulting in high energy efficiency and low inference latency. However,…

Hardware Architecture · Computer Science 2020-11-17 Lucian Petrica , Tobias Alonso , Mairin Kroes , Nicholas Fraser , Sorin Cotofana , Michaela Blott

AoCStream: All-on-Chip CNN Accelerator With Stream-Based Line-Buffer Architecture

Convolutional neural network (CNN) accelerators are being widely used for their efficiency, but they require a large amount of memory, leading to the use of a slow and power consuming external memory. This paper exploits two schemes to…

Hardware Architecture · Computer Science 2022-12-23 Hyeong-Ju Kang

Accelerating CNN inference on FPGAs: A Survey

Convolutional Neural Networks (CNNs) are currently adopted to solve an ever greater number of problems, ranging from speech recognition to image classification and segmentation. The large amount of processing required by CNNs calls for…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-06 Kamel Abdelouahab , Maxime Pelcat , Jocelyn Serot , François Berry

TinyCNN: A Tiny Modular CNN Accelerator for Embedded FPGA

In recent years, Convolutional Neural Network (CNN) based methods have achieved great success in a large number of applications and have been among the most powerful and widely used techniques in computer vision. However, CNN-based methods…

Machine Learning · Computer Science 2019-11-18 Ali Jahanshahi

f-CNN$^{\text{x}}$: A Toolflow for Mapping Multi-CNN Applications on FPGAs

The predictive power of Convolutional Neural Networks (CNNs) has been an integral factor for emerging latency-sensitive applications, such as autonomous drones and vehicles. Such systems employ multiple CNNs, each one trained for a…

Computer Vision and Pattern Recognition · Computer Science 2021-06-09 Stylianos I. Venieris , Christos-Savvas Bouganis

H2PIPE: High throughput CNN Inference on FPGAs with High-Bandwidth Memory

Convolutional Neural Networks (CNNs) combine large amounts of parallelizable computation with frequent memory access. Field Programmable Gate Arrays (FPGAs) can achieve low latency and high throughput CNN inference by implementing dataflow…

Hardware Architecture · Computer Science 2024-08-20 Mario Doumet , Marius Stan , Mathew Hall , Vaughn Betz

Full-stack Optimization for Accelerating CNNs with FPGA Validation

We present a full-stack optimization framework for accelerating inference of CNNs (Convolutional Neural Networks) and validate the approach with field-programmable gate arrays (FPGA) implementations. By jointly optimizing CNN models,…

Machine Learning · Computer Science 2019-05-03 Bradley McDanel , Sai Qian Zhang , H. T. Kung , Xin Dong

A flexible FPGA accelerator for convolutional neural networks

Though CNNs are highly parallel workloads, in the absence of efficient on-chip memory reuse techniques, an accelerator for them quickly becomes memory bound. In this paper, we propose a CNN accelerator design for inference that is able to…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-26 Kingshuk Majumder , Shubham Nema , Uday Bondhugula

An FPGA-Based Accelerator Enabling Efficient Support for CNNs with Arbitrary Kernel Sizes

Convolutional neural networks (CNNs) with large kernels, drawing inspiration from the key operations of vision transformers (ViTs), have demonstrated impressive performance in various vision-based applications. To address the issue of…

Hardware Architecture · Computer Science 2024-02-23 Miaoxin Wang , Xiao Wu , Jun Lin , Zhongfeng Wang

Continuous-Flow Data-Rate-Aware CNN Inference on FPGA

Among hardware accelerators for deep-learning inference, data flow implementations offer low latency and high throughput capabilities. In these architectures, each neuron is mapped to a dedicated hardware unit, making them well-suited for…

Machine Learning · Computer Science 2026-03-10 Tobias Habermann , Michael Mecik , Zhenyu Wang , César David Vera , Martin Kumm , Mario Garrido

A Data-Center FPGA Acceleration Platform for Convolutional Neural Networks

Intensive computation is entering data centers with multiple workloads of deep learning. To balance the compute efficiency, performance, and total cost of ownership (TCO), the use of a field-programmable gate array (FPGA) with…

Computer Vision and Pattern Recognition · Computer Science 2019-09-19 Xiaoyu Yu , Yuwei Wang , Jie Miao , Ephrem Wu , Heng Zhang , Yu Meng , Bo Zhang , Biao Min , Dewei Chen , Jianlin Gao

PipeCNN: An OpenCL-Based FPGA Accelerator for Large-Scale Convolution Neuron Networks

Convolutional neural networks (CNNs) have been widely employed in many applications such as image classification, video analysis and speech recognition. Being compute-intensive, CNN computations are mainly accelerated by GPUs with high…

Hardware Architecture · Computer Science 2016-11-09 Dong Wang , Jianjing An , Ke Xu

Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights Generation

The unprecedented accuracy of convolutional neural networks (CNNs) across a broad range of AI tasks has led to their widespread deployment in mobile and embedded settings. In a pursuit for high-performance and energy-efficient inference,…

Machine Learning · Computer Science 2023-07-26 Stylianos I. Venieris , Javier Fernandez-Marques , Nicholas D. Lane

An Efficient Hardware Accelerator for Structured Sparse Convolutional Neural Networks on FPGAs

Deep Convolutional Neural Networks (CNNs) have achieved state-of-the-art performance in a wide range of applications. However, deeper CNN models, which are usually computation consuming, are widely required for complex Artificial…

Systems and Control · Electrical Eng. & Systems 2020-01-08 Chaoyang Zhu , Kejie Huang , Shuyuan Yang , Ziqi Zhu , Hejia Zhang , Haibin Shen

FFCNN: Fast FPGA based Acceleration for Convolution neural network inference

We present a new efficient OpenCL-based Accelerator for large scale Convolutional Neural Networks called Fast Inference on FPGAs for Convolution Neural Network (FFCNN). FFCNN is based on a deeply pipelined OpenCL kernels architecture. As…

Machine Learning · Computer Science 2022-08-30 F. Keddous , H-N. Nguyen , A. Nakib

Late Breaking Result: FPGA-Based Emulation and Fault Injection for CNN Inference Accelerators

A new field programmable gate array (FPGA)-based emulation platform is proposed to accelerate fault tolerance analysis of inference accelerators of convolutional neural networks (CNN). For a given CNN model, hardware accelerator…

Hardware Architecture · Computer Science 2025-07-23 Filip Masar , Vojtech Mrazek , Lukas Sekanina

FPGA-based Acceleration for Convolutional Neural Networks: A Comprehensive Review

Convolutional Neural Networks (CNNs) are fundamental to deep learning, driving applications across various domains. However, their growing complexity has significantly increased computational demands, necessitating efficient hardware…

Machine Learning · Computer Science 2025-05-21 Junye Jiang , Yaan Zhou , Yuanhao Gong , Haoxuan Yuan , Shuanglong Liu

A Resource-Driven Approach for Implementing CNNs on FPGAs Using Adaptive IPs

The increasing demand for real-time, low-latency artificial intelligence applications has propelled the use of Field-Programmable Gate Arrays (FPGAs) for Convolutional Neural Network (CNN) implementations. FPGAs offer reconfigurability,…

Hardware Architecture · Computer Science 2025-10-06 Philippe Magalhães , Virginie Fresse , Benoît Suffran , Olivier Alata

Layer-specific Optimization for Mixed Data Flow with Mixed Precision in FPGA Design for CNN-based Object Detectors

Convolutional neural networks (CNNs) require both intensive computation and frequent memory access, which lead to a low processing speed and large power dissipation. Although the characteristics of the different layers in a CNN are…

Computer Vision and Pattern Recognition · Computer Science 2020-09-04 Duy Thanh Nguyen , Hyun Kim , Hyuk-Jae Lee

Optimization of FPGA-based CNN Accelerators Using Metaheuristics

In recent years, convolutional neural networks (CNNs) have demonstrated their ability to solve problems in many fields and with accuracy that was not possible before. However, this comes with extensive computational requirements, which made…

Neural and Evolutionary Computing · Computer Science 2022-09-26 Sadiq M. Sait , Aiman El-Maleh , Mohammad Altakrouri , Ahmad Shawahna