Related papers: ARENA: Asynchronous Reconfigurable Accelerator Rin…

Hardware Abstractions and Hardware Mechanisms to Support Multi-Task Execution on Coarse-Grained Reconfigurable Arrays

Domain-specific accelerators are used in various computing systems ranging from edge devices to data centers. Coarse-grained reconfigurable arrays (CGRAs) represent an architectural midpoint between the flexibility of an FPGA and the…

Hardware Architecture · Computer Science 2023-01-04 Taeyoung Kong , Kalhan Koul , Priyanka Raina , Mark Horowitz , Christopher Torng

STRELA: STReaming ELAstic CGRA Accelerator for Embedded Systems

Reconfigurable computing offers a good balance between flexibility and energy efficiency. When combined with software-programmable devices such as CPUs, it is possible to obtain higher performance by spatially distributing the…

Hardware Architecture · Computer Science 2024-04-22 Daniel Vazquez , Jose Miranda , Alfonso Rodriguez , Andres Otero , Pascuale Davide Schiavone , David Atienza

Reconfigurable Hardware Accelerators: Opportunities, Trends, and Challenges

With the emerging big data applications of Machine Learning, Speech Recognition, Artificial Intelligence, and DNA Sequencing in recent years, computer architecture research communities are facing the explosive scale of various data…

Hardware Architecture · Computer Science 2017-12-14 Chao Wang , Wenqi Lou , Lei Gong , Lihui Jin , Luchao Tan , Yahui Hu , Xi Li , Xuehai Zhou

Arena: Efficiently Training Large Models via Dynamic Scheduling and Adaptive Parallelism Co-Design

Efficiently training large-scale models (LMs) in GPU clusters involves two separate avenues: inter-job dynamic scheduling and intra-job adaptive parallelism (AP). However, existing dynamic schedulers struggle with large-model scheduling due…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-25 Chunyu Xue , Weihao Cui , Quan Chen , Chen Chen , Han Zhao , Shulai Zhang , Linmei Wang , Yan Li , Limin Xiao , Weifeng Zhang , Jing Yang , Bingsheng He , Minyi Guo

Flip: Data-Centric Edge CGRA Accelerator

Coarse-Grained Reconfigurable Arrays (CGRA) are promising edge accelerators due to the outstanding balance in flexibility, performance, and energy efficiency. Classic CGRAs statically map compute operations onto the processing elements (PE)…

Hardware Architecture · Computer Science 2023-09-20 Dan Wu , Peng Chen , Thilini Kaushalya Bandara , Zhaoying Li , Tulika Mitra

Re-thinking Memory-Bound Limitations in CGRAs

Coarse-Grained Reconfigurable Arrays (CGRAs) are specialized accelerators commonly employed to boost performance in workloads with iterative structures. Existing research typically focuses on compiler or architecture optimizations aimed at…

Hardware Architecture · Computer Science 2025-08-28 Xiangfeng Liu , Zhe Jiang , Anzhen Zhu , Xiaomeng Han , Mingsong Lyu , Qingxu Deng , Nan Guan

NEURA: A Unified and Retargetable Compilation Framework for Coarse-Grained Reconfigurable Architectures

Coarse-Grained Reconfigurable Architectures (CGRAs) are a promising and versatile accelerator platform, offering a balance between the performance and efficiency of specialized accelerators and the software programmability. However, their…

Programming Languages · Computer Science 2026-04-07 Shangkun Li , Jinming Ge , Diyuan Tao , Zeyu Li , Jiawei Liang , Linfeng Du , Jiang Xu , Wei Zhang , Cheng Tan

HYDRA: Hybrid Data Multiplexing and Run-time Layer Configurable DNN Accelerator

Deep neural networks (DNNs) offer plenty of challenges in executing efficient computation at edge nodes, primarily due to the huge hardware resource demands. The article proposes HYDRA, hybrid data multiplexing, and runtime layer…

Hardware Architecture · Computer Science 2026-03-31 Sonu Kumar , Komal Gupta , Gopal Raut , Mukul Lokhande , Santosh Kumar Vishvakarma

Reconfigurable co-processor architecture with limited numerical precision to accelerate deep convolutional neural networks

Convolutional Neural Networks (CNNs) are widely used in deep learning applications, e.g. visual systems, robotics etc. However, existing software solutions are not efficient. Therefore, many hardware accelerators have been proposed…

Machine Learning · Computer Science 2021-09-08 Sasindu Wijeratne , Sandaruwan Jayaweera , Mahesh Dananjaya , Ajith Pasqual

A flexible framework for early power and timing comparison of time-multiplexed CGRA kernel executions

At the intersection between traditional CPU architectures and more specialized options such as FPGAs or ASICs lies the family of reconfigurable hardware architectures, termed Coarse-Grained Reconfigurable Arrays (CGRAs). CGRAs are composed…

Hardware Architecture · Computer Science 2025-09-05 Maxime Henri Aspros , Juan Sapriza , Giovanni Ansaloni , David Atienza

Beyond the GPU: The Strategic Role of FPGAs in the Next Wave of AI

AI acceleration has been dominated by GPUs, but the growing need for lower latency, energy efficiency, and fine-grained hardware control exposes the limits of fixed architectures. In this context, Field-Programmable Gate Arrays (FPGAs)…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-18 Arturo Urías Jiménez

DR-CGRA: Supporting Loop-Carried Dependencies in CGRAs Without Spilling Intermediate Values

Coarse-grain reconfigurable architectures (CGRAs) are gaining traction thanks to their performance and power efficiency. Utilizing CGRAs to accelerate the execution of tight loops holds great potential for achieving significant overall…

Hardware Architecture · Computer Science 2024-05-28 Elad Hadar , Yoav Etsion

Building an Open CGRA Ecosystem for Agile Innovation

Modern computing workloads, particularly in AI and edge applications, demand hardware-software co-design to meet aggressive performance and energy targets. Such co-design benefits from open and agile platforms that replace closed,…

Hardware Architecture · Computer Science 2025-08-27 Rohan Juneja , Pranav Dangi , Thilini Kaushalya Bandara , Zhaoying Li , Dhananjaya Wijerathne , Li-Shiuan Peh , Tulika Mitra

CARLA: A Convolution Accelerator with a Reconfigurable and Low-Energy Architecture

Convolutional Neural Networks (CNNs) have proven to be extremely accurate for image recognition, even outperforming human recognition capability. When deployed on battery-powered mobile devices, efficient computer architectures are required…

Hardware Architecture · Computer Science 2020-10-05 Mehdi Ahmadi , Shervin Vakili , J. M. Pierre Langlois

A Survey on Coarse-Grained Reconfigurable Architectures from a Performance Perspective

With the end of both Dennard's scaling and Moore's law, computer users and researchers are aggressively exploring alternative forms of computing in order to continue the performance scaling that we have come to enjoy. Among the more salient…

Hardware Architecture · Computer Science 2020-09-16 Artur Podobas , Kentaro Sano , Satoshi Matsuoka

FPGA-based Acceleration for Convolutional Neural Networks: A Comprehensive Review

Convolutional Neural Networks (CNNs) are fundamental to deep learning, driving applications across various domains. However, their growing complexity has significantly increased computational demands, necessitating efficient hardware…

Machine Learning · Computer Science 2025-05-21 Junye Jiang , Yaan Zhou , Yuanhao Gong , Haoxuan Yuan , Shuanglong Liu

Self-Adaptive Reconfigurable Arrays (SARA): Using ML to Assist Scaling GEMM Acceleration

With increasing diversity in Deep Neural Network(DNN) models in terms of layer shapes and sizes, the research community has been investigating flexible/reconfigurable accelerator substrates. This line of research has opened up two…

Hardware Architecture · Computer Science 2022-04-26 Ananda Samajdar , Michael Pellauer , Tushar Krishna

Arena: A Learning-based Synchronization Scheme for Hierarchical Federated Learning--Technical Report

Federated learning (FL) enables collaborative model training among distributed devices without data sharing, but existing FL suffers from poor scalability because of global model synchronization. To address this issue, hierarchical…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-22 Tianyu Qi , Yufeng Zhan , Peng Li , Jingcai Guo , Yuanqing Xia

An ultra-low-power CGRA for accelerating Transformers at the edge

Transformers have revolutionized deep learning with applications in natural language processing, computer vision, and beyond. However, their computational demands make it challenging to deploy them on low-power edge devices. This paper…

Hardware Architecture · Computer Science 2025-07-18 Rohit Prasad

SPA-GCN: Efficient and Flexible GCN Accelerator with an Application for Graph Similarity Computation

While there have been many studies on hardware acceleration for deep learning on images, there has been a rather limited focus on accelerating deep learning applications involving graphs. The unique characteristics of graphs, such as the…

Machine Learning · Computer Science 2021-11-12 Atefeh Sohrabizadeh , Yuze Chi , Jason Cong