Related papers: Accelerating Algorithms using a Dataflow Graph in …

Static Communication Analysis for Hardware Design

Hardware acceleration of algorithms is an effective method for improving performance in high-demand computational tasks. However, developing hardware designs for such acceleration fundamentally differs from software development, as it…

Hardware Architecture · Computer Science 2025-05-28 Mads Rosendahl , Maja H. Kirkeby

A Self-Reconfigurable Computing Platform Hardware Architecture

Field Programmable Gate Arrays (FPGAs) have recently been increasingly used for highly-parallel processing of compute intensive tasks. This paper introduces an FPGA hardware platform architecture that is PC-based, allows for fast…

Hardware Architecture · Computer Science 2007-05-23 Andreas Weisensee , Darran Nathan

Beyond the GPU: The Strategic Role of FPGAs in the Next Wave of AI

AI acceleration has been dominated by GPUs, but the growing need for lower latency, energy efficiency, and fine-grained hardware control exposes the limits of fixed architectures. In this context, Field-Programmable Gate Arrays (FPGAs)…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-18 Arturo Urías Jiménez

Streaming Task Graph Scheduling for Dataflow Architectures

Dataflow devices represent an avenue towards saving the control and data movement overhead of Load-Store Architectures. Various dataflow accelerators have been proposed, but how to efficiently schedule applications on such devices remains…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-06 Tiziano De Matteis , Lukas Gianinazzi , Johannes de Fine Licht , Torsten Hoefler

Exploring Memory Access Patterns for Graph Processing Accelerators

Recent trends in business and technology (e.g., machine learning, social network analysis) benefit from storing and processing growing amounts of graph-structured data in databases and data science platforms. FPGAs as accelerators for graph…

Databases · Computer Science 2021-02-09 Jonas Dann , Daniel Ritter , Holger Fröning

A Scalable FPGA Architecture With Adaptive Memory Utilization for GEMM-Based Operations

Deep neural network (DNN) inference relies increasingly on specialized hardware for high computational efficiency. This work introduces a field-programmable gate array (FPGA)-based dynamically configurable accelerator featuring systolic…

Hardware Architecture · Computer Science 2025-10-10 Anastasios Petropoulos , Theodore Antonakopoulos

Automatic Nested Loop Acceleration on FPGAs Using Soft CGRA Overlay

Offloading compute intensive nested loops to execute on FPGA accelerators have been demonstrated by numerous researchers as an effective performance enhancement technique across numerous application domains. To construct such accelerators…

Hardware Architecture · Computer Science 2015-09-02 Cheng Liu , Ho-Cheung Ng , Hayden Kwok-Hay So

Reconfigurable Hardware Accelerators: Opportunities, Trends, and Challenges

With the emerging big data applications of Machine Learning, Speech Recognition, Artificial Intelligence, and DNA Sequencing in recent years, computer architecture research communities are facing the explosive scale of various data…

Hardware Architecture · Computer Science 2017-12-14 Chao Wang , Wenqi Lou , Lei Gong , Lihui Jin , Luchao Tan , Yahui Hu , Xi Li , Xuehai Zhou

TAPA: A Scalable Task-Parallel Dataflow Programming Framework for Modern FPGAs with Co-Optimization of HLS and Physical Design

In this paper, we propose TAPA, an end-to-end framework that compiles a C++ task-parallel dataflow program into a high-frequency FPGA accelerator. Compared to existing solutions, TAPA has two major advantages. First, TAPA provides a set of…

Hardware Architecture · Computer Science 2024-10-18 Licheng Guo , Yuze Chi , Jason Lau , Linghao Song , Xingyu Tian , Moazin Khatti , Weikang Qiao , Jie Wang , Ecenur Ustun , Zhenman Fang , Zhiru Zhang , Jason Cong

Technical Report: Accelerating Dynamic Graph Analytics on GPUs

As graph analytics often involves compute-intensive operations, GPUs have been extensively used to accelerate the processing. However, in many applications such as social networks, cyber security, and fraud detection, their representative…

Data Structures and Algorithms · Computer Science 2018-06-28 Mo Sha , Yuchen Li , Bingsheng He , Kian-Lee Tan

FPGA based hybrid architecture for parallelizing RRT

Field Programmable Gate Arrays(FPGA) exceed the computing power of software based implementations by breaking the paradigm of sequential execution and accomplishing more per clock cycle by enabling hardware level parallelization at an…

Robotics · Computer Science 2016-07-20 Gurshaant Malik , Krishna Gupta , Raunak Dharani , K Madhava Krishna

Fast Processing of Large Graph Applications Using Asynchronous Architecture

Graph algorithms and techniques are increasingly being used in scientific and commercial applications to express relations and explore large data sets. Although conventional or commodity computer architectures, like CPU or GPU, can compute…

Hardware Architecture · Computer Science 2017-07-03 Michel A. Kinsy , Rashmi S. Agrawal , Hien D. Nguyen

Flip: Data-Centric Edge CGRA Accelerator

Coarse-Grained Reconfigurable Arrays (CGRA) are promising edge accelerators due to the outstanding balance in flexibility, performance, and energy efficiency. Classic CGRAs statically map compute operations onto the processing elements (PE)…

Hardware Architecture · Computer Science 2023-09-20 Dan Wu , Peng Chen , Thilini Kaushalya Bandara , Zhaoying Li , Tulika Mitra

Reconfigurable Parallel Data Flow Architecture

This paper presents a reconfigurable parallel data flow architecture. This architecture uses the concepts of multi-agent paradigm in reconfigurable hardware systems. The utilization of this new paradigm has the potential to greatly increase…

Multiagent Systems · Computer Science 2010-03-11 Hamid Reza Naji

A Reconfigurable Framework for AI-FPGA Agent Integration and Acceleration

Artificial intelligence (AI) is increasingly deployed in real-time and energy-constrained environments, driving demand for hardware platforms that can deliver high performance and power efficiency. While central processing units (CPUs) and…

Hardware Architecture · Computer Science 2026-01-28 Aybars Yunusoglu , Talha Coskun , Hiruna Vishwamith , Murat Isik , I. Can Dikmen

MatrixFlow: System-Accelerator co-design for high-performance transformer applications

Transformers are central to advances in artificial intelligence (AI), excelling in fields ranging from computer vision to natural language processing. Despite their success, their large parameter count and computational demands challenge…

Hardware Architecture · Computer Science 2025-03-10 Qunyou Liu , Marina Zapater , David Atienza

FPGA based Parallelized Architecture of Efficient Graph based Image Segmentation Algorithm

Efficient and real time segmentation of color images has a variety of importance in many fields of computer vision such as image compression, medical imaging, mapping and autonomous navigation. Being one of the most computationally…

Computer Vision and Pattern Recognition · Computer Science 2017-10-09 Roopal Nahar , Akanksha Baranwal , K. Madhava Krishna

Stateful Dataflow Multigraphs: A Data-Centric Model for Performance Portability on Heterogeneous Architectures

The ubiquity of accelerators in high-performance computing has driven programming complexity beyond the skill-set of the average domain scientist. To maintain performance portability in the future, it is imperative to decouple…

Programming Languages · Computer Science 2020-01-06 Tal Ben-Nun , Johannes de Fine Licht , Alexandros Nikolaos Ziogas , Timo Schneider , Torsten Hoefler

Leveraging Recurrent Patterns in Graph Accelerators

Graph accelerators have emerged as a promising solution for processing large-scale sparse graphs, leveraging the in-situ compu-tation of ReRAM-based crossbars to maximize computational efficiency. However, existing designs suffer from…

Hardware Architecture · Computer Science 2025-12-02 Masoud Rahimi , Sébastien Le Beux

A Graph-Partition-Based Scheduling Policy for Heterogeneous Architectures

In order to improve system performance efficiently, a number of systems choose to equip multi-core and many-core processors (such as GPUs). Due to their discrete memory these heterogeneous architectures comprise a distributed system within…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-02-27 Hao Wu , Daniel Lohmann , Wolfgang Schröder-Preikschat