English
Related papers

Related papers: Design optimization for high-performance computing…

200 papers

With the development of hardware-optimized deployment of spiking neural networks (SNNs), SNN processors based on field-programmable gate arrays (FPGAs) have become a research hotspot due to their efficiency and flexibility. However,…

Neural and Evolutionary Computing · Computer Science 2026-01-06 Hou Yue , Xiang Shuiying , Zou Tao , Huang Zhiquan , Shi Shangxuan , Guo Xingxing , Zhang Yahui , Zheng Ling , Hao Yue

Residual neural networks are widely used in computer vision tasks. They enable the construction of deeper and more accurate models by mitigating the vanishing gradient problem. Their main innovation is the residual block which allows the…

Hardware Architecture · Computer Science 2023-11-03 Filippo Minnella , Teodoro Urso , Mihai T. Lazarescu , Luciano Lavagno

FPGA is appropriate for fix-point neural networks computing due to high power efficiency and configurability. However, its design must be intensively refined to achieve high performance using limited hardware resources. We present an…

Hardware Architecture · Computer Science 2022-01-03 Qingyang Yi , Heming Sun , Masahiro Fujita

Overlays have shown significant promise for field-programmable gate-arrays (FPGAs) as they allow for fast development cycles and remove many of the challenges of the traditional FPGA hardware design flow. However, this often comes with a…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-18 Mohamed S. Abdelfattah , David Han , Andrew Bitar , Roberto DiCecco , Shane OConnell , Nitika Shanker , Joseph Chu , Ian Prins , Joshua Fender , Andrew C. Ling , Gordon R. Chiu

Intensive computation is entering data centers with multiple workloads of deep learning. To balance the compute efficiency, performance, and total cost of ownership (TCO), the use of a field-programmable gate array (FPGA) with…

Computer Vision and Pattern Recognition · Computer Science 2019-09-19 Xiaoyu Yu , Yuwei Wang , Jie Miao , Ephrem Wu , Heng Zhang , Yu Meng , Bo Zhang , Biao Min , Dewei Chen , Jianlin Gao

Field-programmable gate array (FPGA) based accelerators are being widely used for acceleration of convolutional neural networks (CNNs) due to their potential in improving the performance and reconfigurability for specific application…

Image and Video Processing · Electrical Eng. & Systems 2020-02-04 Martin Ferianc , Hongxiang Fan , Ringo S. W. Chu , Jakub Stano , Wayne Luk

New heterogeneous computing paradigms on dedicated hardware with increased parallelization, such as Field Programmable Gate Arrays (FPGAs), offer exciting solutions with large potential gains. The growing applications of machine learning…

Artificial intelligence (AI) is increasingly deployed in real-time and energy-constrained environments, driving demand for hardware platforms that can deliver high performance and power efficiency. While central processing units (CPUs) and…

Hardware Architecture · Computer Science 2026-01-28 Aybars Yunusoglu , Talha Coskun , Hiruna Vishwamith , Murat Isik , I. Can Dikmen

Since introduced, Swin Transformer has achieved remarkable results in the field of computer vision, it has sparked the need for dedicated hardware accelerators, specifically catering to edge computing demands. For the advantages of…

Hardware Architecture · Computer Science 2023-08-29 Zhiyang Liu , Pengyu Yin , Zhenhua Ren

Transformer-based models are becoming more and more intelligent and are revolutionizing a wide range of human tasks. To support their deployment, AI labs offer inference services that consume hundreds of GWh of energy annually and charge…

Systems and Control · Electrical Eng. & Systems 2025-08-29 Ching-Yi Lin , Sahil Shah

Deep neural network (DNN) inference relies increasingly on specialized hardware for high computational efficiency. This work introduces a field-programmable gate array (FPGA)-based dynamically configurable accelerator featuring systolic…

Hardware Architecture · Computer Science 2025-10-10 Anastasios Petropoulos , Theodore Antonakopoulos

When trained as generative models, Deep Learning algorithms have shown exceptional performance on tasks involving high dimensional data such as image denoising and super-resolution. In an increasingly connected world dominated by mobile and…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-10 Ian Colbert , Jake Daly , Ken Kreutz-Delgado , Srinjoy Das

Though CNNs are highly parallel workloads, in the absence of efficient on-chip memory reuse techniques, an accelerator for them quickly becomes memory bound. In this paper, we propose a CNN accelerator design for inference that is able to…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-26 Kingshuk Majumder , Shubham Nema , Uday Bondhugula

Recent researches on neural network have shown significant advantage in machine learning over traditional algorithms based on handcrafted features and models. Neural network is now widely adopted in regions like image, speech and video…

Hardware Architecture · Computer Science 2018-12-07 Kaiyuan Guo , Shulin Zeng , Jincheng Yu , Yu Wang , Huazhong Yang

Low-precision is the first order knob for achieving higher Artificial Intelligence Operations (AI-TOPS). However the algorithmic space for sub-8-bit precision compute is diverse, with disruptive changes happening frequently, making FPGAs a…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-02 Sudarshan Srinivasan , Pradeep Janedula , Saurabh Dhoble , Sasikanth Avancha , Dipankar Das , Naveen Mellempudi , Bharat Daga , Martin Langhammer , Gregg Baeckler , Bharat Kaul

This study presents advanced neural network architectures including Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTMs), and Deep Belief Networks (DBNs) for enhanced ECG signal…

Hardware Architecture · Computer Science 2023-07-18 Kayode Inadagbo , Baran Arig , Nisanur Alici , Murat Isik

Various hardware accelerators have been developed for energy-efficient and real-time inference of neural networks on edge devices. However, most training is done on high-performance GPUs or servers, and the huge memory and computing costs…

Hardware Architecture · Computer Science 2021-04-21 Kaiqi Zhang , Cole Hawkins , Xiyuan Zhang , Cong Hao , Zheng Zhang

A fundamental question lies in almost every application of deep neural networks: what is the optimal neural architecture given a specific dataset? Recently, several Neural Architecture Search (NAS) frameworks have been developed that use…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-04 Weiwen Jiang , Xinyi Zhang , Edwin H. -M. Sha , Lei Yang , Qingfeng Zhuge , Yiyu Shi , Jingtong Hu

In view of the large amount of calculation and long calculation time of convolutional neural network (CNN), this paper proposes a convolutional neural network hardware accelerator based on field programmable logic gate array (FPGA). First,…

Hardware Architecture · Computer Science 2020-12-08 Xiong Jun

Edge AI deployment faces critical challenges balancing computational performance, energy efficiency, and resource constraints. This paper presents FPGA-accelerated RISC-V instruction set architecture (ISA) extensions for efficient neural…

Hardware Architecture · Computer Science 2025-11-11 Arya Parameshwara , Santosh Hanamappa Mokashi
‹ Prev 1 2 3 10 Next ›