Related papers: Design optimization for high-performance computing…

Implementation of high-efficiency, lightweight residual spiking neural network processor based on field-programmable gate arrays

With the development of hardware-optimized deployment of spiking neural networks (SNNs), SNN processors based on field-programmable gate arrays (FPGAs) have become a research hotspot due to their efficiency and flexibility. However,…

Neural and Evolutionary Computing · Computer Science 2026-01-06 Hou Yue , Xiang Shuiying , Zou Tao , Huang Zhiquan , Shi Shangxuan , Guo Xingxing , Zhang Yahui , Zheng Ling , Hao Yue

Design and Optimization of Residual Neural Network Accelerators for Low-Power FPGAs Using High-Level Synthesis

Residual neural networks are widely used in computer vision tasks. They enable the construction of deeper and more accurate models by mitigating the vanishing gradient problem. Their main innovation is the residual block which allows the…

Hardware Architecture · Computer Science 2023-11-03 Filippo Minnella , Teodoro Urso , Mihai T. Lazarescu , Luciano Lavagno

FPGA Based Accelerator for Neural Networks Computation with Flexible Pipelining

FPGA is appropriate for fix-point neural networks computing due to high power efficiency and configurability. However, its design must be intensively refined to achieve high performance using limited hardware resources. We present an…

Hardware Architecture · Computer Science 2022-01-03 Qingyang Yi , Heming Sun , Masahiro Fujita

DLA: Compiler and FPGA Overlay for Neural Network Inference Acceleration

Overlays have shown significant promise for field-programmable gate-arrays (FPGAs) as they allow for fast development cycles and remove many of the challenges of the traditional FPGA hardware design flow. However, this often comes with a…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-18 Mohamed S. Abdelfattah , David Han , Andrew Bitar , Roberto DiCecco , Shane OConnell , Nitika Shanker , Joseph Chu , Ian Prins , Joshua Fender , Andrew C. Ling , Gordon R. Chiu

A Data-Center FPGA Acceleration Platform for Convolutional Neural Networks

Intensive computation is entering data centers with multiple workloads of deep learning. To balance the compute efficiency, performance, and total cost of ownership (TCO), the use of a field-programmable gate array (FPGA) with…

Computer Vision and Pattern Recognition · Computer Science 2019-09-19 Xiaoyu Yu , Yuwei Wang , Jie Miao , Ephrem Wu , Heng Zhang , Yu Meng , Bo Zhang , Biao Min , Dewei Chen , Jianlin Gao

Improving Performance Estimation for FPGA-based Accelerators for Convolutional Neural Networks

Field-programmable gate array (FPGA) based accelerators are being widely used for acceleration of convolutional neural networks (CNNs) due to their potential in improving the performance and reconfigurability for specific application…

Image and Video Processing · Electrical Eng. & Systems 2020-02-04 Martin Ferianc , Hongxiang Fan , Ringo S. W. Chu , Jakub Stano , Wayne Luk

FPGA-accelerated machine learning inference as a service for particle physics computing

New heterogeneous computing paradigms on dedicated hardware with increased parallelization, such as Field Programmable Gate Arrays (FPGAs), offer exciting solutions with large potential gains. The growing applications of machine learning…

Data Analysis, Statistics and Probability · Physics 2019-10-17 Javier Duarte , Philip Harris , Scott Hauck , Burt Holzman , Shih-Chieh Hsu , Sergo Jindariani , Suffian Khan , Benjamin Kreis , Brian Lee , Mia Liu , Vladimir Lončar , Jennifer Ngadiuba , Kevin Pedro , Brandon Perez , Maurizio Pierini , Dylan Rankin , Nhan Tran , Matthew Trahms , Aristeidis Tsaris , Colin Versteeg , Ted W. Way , Dustin Werran , Zhenbin Wu

A Reconfigurable Framework for AI-FPGA Agent Integration and Acceleration

Artificial intelligence (AI) is increasingly deployed in real-time and energy-constrained environments, driving demand for hardware platforms that can deliver high performance and power efficiency. While central processing units (CPUs) and…

Hardware Architecture · Computer Science 2026-01-28 Aybars Yunusoglu , Talha Coskun , Hiruna Vishwamith , Murat Isik , I. Can Dikmen

An Efficient FPGA-Based Accelerator for Swin Transformer

Since introduced, Swin Transformer has achieved remarkable results in the field of computer vision, it has sparked the need for dedicated hardware accelerators, specifically catering to edge computing demands. For the advantages of…

Hardware Architecture · Computer Science 2023-08-29 Zhiyang Liu , Pengyu Yin , Zhenhua Ren

Systolic Array-based Architecture for Low-Bit Integerized Vision Transformers

Transformer-based models are becoming more and more intelligent and are revolutionizing a wide range of human tasks. To support their deployment, AI labs offer inference services that consume hundreds of GWh of energy annually and charge…

Systems and Control · Electrical Eng. & Systems 2025-08-29 Ching-Yi Lin , Sahil Shah

A Scalable FPGA Architecture With Adaptive Memory Utilization for GEMM-Based Operations

Deep neural network (DNN) inference relies increasingly on specialized hardware for high computational efficiency. This work introduces a field-programmable gate array (FPGA)-based dynamically configurable accelerator featuring systolic…

Hardware Architecture · Computer Science 2025-10-10 Anastasios Petropoulos , Theodore Antonakopoulos

A Competitive Edge: Can FPGAs Beat GPUs at DCNN Inference Acceleration in Resource-Limited Edge Computing Applications?

When trained as generative models, Deep Learning algorithms have shown exceptional performance on tasks involving high dimensional data such as image denoising and super-resolution. In an increasingly connected world dominated by mobile and…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-10 Ian Colbert , Jake Daly , Ken Kreutz-Delgado , Srinjoy Das

A flexible FPGA accelerator for convolutional neural networks

Though CNNs are highly parallel workloads, in the absence of efficient on-chip memory reuse techniques, an accelerator for them quickly becomes memory bound. In this paper, we propose a CNN accelerator design for inference that is able to…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-26 Kingshuk Majumder , Shubham Nema , Uday Bondhugula

A Survey of FPGA-Based Neural Network Accelerator

Recent researches on neural network have shown significant advantage in machine learning over traditional algorithms based on handcrafted features and models. Neural network is now widely adopted in regions like image, speech and video…

Hardware Architecture · Computer Science 2018-12-07 Kaiyuan Guo , Shulin Zeng , Jincheng Yu , Yu Wang , Huazhong Yang

High Performance Scalable FPGA Accelerator for Deep Neural Networks

Low-precision is the first order knob for achieving higher Artificial Intelligence Operations (AI-TOPS). However the algorithmic space for sub-8-bit precision compute is diverse, with disruptive changes happening frequently, making FPGAs a…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-02 Sudarshan Srinivasan , Pradeep Janedula , Saurabh Dhoble , Sasikanth Avancha , Dipankar Das , Naveen Mellempudi , Bharat Daga , Martin Langhammer , Gregg Baeckler , Bharat Kaul

Exploiting FPGA Capabilities for Accelerated Biomedical Computing

This study presents advanced neural network architectures including Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTMs), and Deep Belief Networks (DBNs) for enhanced ECG signal…

Hardware Architecture · Computer Science 2023-07-18 Kayode Inadagbo , Baran Arig , Nisanur Alici , Murat Isik

On-FPGA Training with Ultra Memory Reduction: A Low-Precision Tensor Method

Various hardware accelerators have been developed for energy-efficient and real-time inference of neural networks on edge devices. However, most training is done on high-performance GPUs or servers, and the huge memory and computing costs…

Hardware Architecture · Computer Science 2021-04-21 Kaiqi Zhang , Cole Hawkins , Xiyuan Zhang , Cong Hao , Zheng Zhang

Accuracy vs. Efficiency: Achieving Both through FPGA-Implementation Aware Neural Architecture Search

A fundamental question lies in almost every application of deep neural networks: what is the optimal neural architecture given a specific dataset? Recently, several Neural Architecture Search (NAS) frameworks have been developed that use…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-04 Weiwen Jiang , Xinyi Zhang , Edwin H. -M. Sha , Lei Yang , Qingfeng Zhuge , Yiyu Shi , Jingtong Hu

FPGA deep learning acceleration based on convolutional neural network

In view of the large amount of calculation and long calculation time of convolutional neural network (CNN), this paper proposes a convolutional neural network hardware accelerator based on field programmable logic gate array (FPGA). First,…

Hardware Architecture · Computer Science 2020-12-08 Xiong Jun

FPGA-Accelerated RISC-V ISA Extensions for Efficient Neural Network Inference on Edge Devices

Edge AI deployment faces critical challenges balancing computational performance, energy efficiency, and resource constraints. This paper presents FPGA-accelerated RISC-V instruction set architecture (ISA) extensions for efficient neural…

Hardware Architecture · Computer Science 2025-11-11 Arya Parameshwara , Santosh Hanamappa Mokashi