Related papers: Developing and Deploying Advanced Algorithms to No…

FAST: FPGA-based Subgraph Matching on Massive Graphs

Subgraph matching is a basic operation widely used in many applications. However, due to its NP-hardness and the explosive growth of graph data, it is challenging to compute subgraph matching, especially in large graphs. In this paper, we…

Databases · Computer Science 2021-02-25 Xin Jin , Zhengyi Yang , Xuemin Lin , Shiyu Yang , Lu Qin , You Peng

Design space exploration for image processing architectures on FPGA targets

Due to the emergence of embedded applications in image and video processing, communication and cryptography, improvement of pictorial information for better human perception like deblurring, denoising in several fields such as satellite…

Hardware Architecture · Computer Science 2014-04-16 Chandrajit Pal , Avik Kotal , Asit Samanta , Amlan Chakrabarti , Ranjan Ghosh

A Reconfigurable Vector Instruction Processor for Accelerating a Convection Parametrization Model on FPGAs

High Performance Computing (HPC) platforms allow scientists to model computationally intensive algorithms. HPC clusters increasingly use General-Purpose Graphics Processing Units (GPGPUs) as accelerators; FPGAs provide an attractive…

Hardware Architecture · Computer Science 2015-04-20 Syed Waqar Nabi , Saji N. Hameed , Wim Vanderbauwhede

Real Time FPGA Based CNNs for Detection, Classification, and Tracking in Autonomous Systems: State of the Art Designs and Optimizations

This paper presents a comprehensive review of recent advances in deploying convolutional neural networks (CNNs) for object detection, classification, and tracking on Field Programmable Gate Arrays (FPGAs). With the increasing demand for…

Hardware Architecture · Computer Science 2025-09-05 Safa Mohammed Sali , Mahmoud Meribout , Ashiyana Abdul Majeed

Late Breaking Results: Fast System Technology Co-Optimization Framework for Emerging Technology Based on Graph Neural Networks

This paper proposes a fast system technology co-optimization (STCO) framework that optimizes power, performance, and area (PPA) for next-generation IC design, addressing the challenges and opportunities presented by novel materials and…

Emerging Technologies · Computer Science 2024-10-31 Tianliang Ma , Guangxi Fan , Xuguang Sun , Zhihui Deng , Kainlu Low , Leilai Shao

A Data-Center FPGA Acceleration Platform for Convolutional Neural Networks

Intensive computation is entering data centers with multiple workloads of deep learning. To balance the compute efficiency, performance, and total cost of ownership (TCO), the use of a field-programmable gate array (FPGA) with…

Computer Vision and Pattern Recognition · Computer Science 2019-09-19 Xiaoyu Yu , Yuwei Wang , Jie Miao , Ephrem Wu , Heng Zhang , Yu Meng , Bo Zhang , Biao Min , Dewei Chen , Jianlin Gao

Memory-Guided Unified Hardware Accelerator for Mixed-Precision Scientific Computing

Recent hardware acceleration advances have enabled powerful specialized accelerators for finite element computations, spiking neural network inference, and sparse tensor operations. However, existing approaches face fundamental limitations:…

Hardware Architecture · Computer Science 2026-01-09 Chuanzhen Wang , Leo Zhang , Eric Liu

FpgaHub: Fpga-centric Hyper-heterogeneous Computing Platform for Big Data Analytics

Modern data analytics requires a huge amount of computing power and processes a massive amount of data. At the same time, the underlying computing platform is becoming much more heterogeneous on both hardware and software. Even though…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-13 Zeke Wang , Jie Zhang , Hongjing Huang , Yingtao Li , Xueying Zhu , Mo Sun , Zihan Yang , De Ma , Huajing Tang , Gang Pan , Fei Wu , Bingsheng He , Gustavo Alonso

High-Performance Parallel Implementation of Genetic Algorithm on FPGA

Genetic Algorithms (GAs) are used to solve search and optimization problems in which an optimal solution can be found using an iterative process with probabilistic and non-deterministic transitions. However, depending on the problem's…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-23 Matheus F. Torquato , Marcelo A. C. Fernandes

A Survey of FPGA-Based Robotic Computing

Recent researches on robotics have shown significant improvement, spanning from algorithms, mechanics to hardware architectures. Robotics, including manipulators, legged robots, drones, and autonomous vehicles, are now widely applied in…

Robotics · Computer Science 2021-03-08 Zishen Wan , Bo Yu , Thomas Yuang Li , Jie Tang , Yuhao Zhu , Yu Wang , Arijit Raychowdhury , Shaoshan Liu

From Circuits to SoC Processors: Arithmetic Approximation Techniques & Embedded Computing Methodologies for DSP Acceleration

The computing industry is forced to find alternative design approaches and computing platforms to sustain increased power efficiency, while providing sufficient performance. Among the examined solutions, Approximate Computing, Hardware…

Hardware Architecture · Computer Science 2024-09-09 Vasileios Leon

FPGA Based Accelerator for Neural Networks Computation with Flexible Pipelining

FPGA is appropriate for fix-point neural networks computing due to high power efficiency and configurability. However, its design must be intensively refined to achieve high performance using limited hardware resources. We present an…

Hardware Architecture · Computer Science 2022-01-03 Qingyang Yi , Heming Sun , Masahiro Fujita

Advanced Programming Platform for efficient use of Data Parallel Hardware

Graphics processing units (GPU) had evolved from a specialized hardware capable to render high quality graphics in games to a commodity hardware for effective processing blocks of data in a parallel schema. This evolution is particularly…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-03-26 Luis Cabellos

A Resource-efficient Spiking Neural Network Accelerator Supporting Emerging Neural Encoding

Spiking neural networks (SNNs) recently gained momentum due to their low-power multiplication-free computing and the closer resemblance of biological processes in the nervous system of humans. However, SNNs require very long spike trains…

Hardware Architecture · Computer Science 2022-06-07 Daniel Gerlinghoff , Zhehui Wang , Xiaozhe Gu , Rick Siow Mong Goh , Tao Luo

Exploiting FPGA Capabilities for Accelerated Biomedical Computing

This study presents advanced neural network architectures including Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTMs), and Deep Belief Networks (DBNs) for enhanced ECG signal…

Hardware Architecture · Computer Science 2023-07-18 Kayode Inadagbo , Baran Arig , Nisanur Alici , Murat Isik

FPGA-Optimized Hardware Accelerator for Fast Fourier Transform and Singular Value Decomposition in AI

This research introduces an FPGA-based hardware accelerator to optimize the Singular Value Decomposition (SVD) and Fast Fourier transform (FFT) operations in AI models. The proposed design aims to improve processing speed and reduce…

Hardware Architecture · Computer Science 2025-04-15 Hong Ding , Chia Chao Kang , SuYang Xi , Zehang Liu , Xuan Zhang , Yi Ding

Hardware Acceleration of HPC Computational Flow Dynamics using HBM-enabled FPGAs

Scientific computing is at the core of many High-Performance Computing applications, including computational flow dynamics. Because of the uttermost importance to simulate increasingly larger computational models, hardware acceleration is…

Hardware Architecture · Computer Science 2022-01-13 Tom Hogervorst , Tong Dong Qiu , Giacomo Marchiori , Alf Birger , Markus Blatt , Razvan Nane

Full-stack Optimization for Accelerating CNNs with FPGA Validation

We present a full-stack optimization framework for accelerating inference of CNNs (Convolutional Neural Networks) and validate the approach with field-programmable gate arrays (FPGA) implementations. By jointly optimizing CNN models,…

Machine Learning · Computer Science 2019-05-03 Bradley McDanel , Sai Qian Zhang , H. T. Kung , Xin Dong

How to obtain efficient GPU kernels: an illustration using FMM & FGT algorithms

Computing on graphics processors is maybe one of the most important developments in computational science to happen in decades. Not since the arrival of the Beowulf cluster, which combined open source software with commodity hardware to…

Mathematical Software · Computer Science 2011-09-21 Felipe A. Cruz , Simon K. Layton , Lorena A. Barba

Accelerating Scientific Computations with Mixed Precision Algorithms

On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and…

Mathematical Software · Computer Science 2015-05-13 Marc Baboulin , Alfredo Buttari , Jack Dongarra , Jakub Kurzak , Julie Langou , Julien Langou , Piotr Luszczek , Stanimire Tomov