English
Related papers

Related papers: CODO: An Automated Compiler for Comprehensive Data…

200 papers

Recent deep learning workloads increasingly push computational demand beyond what current memory systems can sustain, with many kernels stalling on data movement rather than computation. While modern dataflow accelerators incorporate…

Programming Languages · Computer Science 2025-09-09 Shihan Fang , Hongzheng Chen , Niansong Zhang , Jiajie Li , Han Meng , Adrian Liu , Zhiru Zhang

To increase performance and efficiency, systems use FPGAs as reconfigurable accelerators. A key challenge in designing these systems is partitioning computation between processors and an FPGA. An appropriate division of labor may be…

Hardware Architecture · Computer Science 2021-07-21 Endri Bezati , Mahyar Emami , Jörn Janneck , James Larus

FPGAs have found their way into data centers as accelerator cards, making reconfigurable computing more accessible for high-performance applications. At the same time, new high-level synthesis compilers like Xilinx Vitis and runtime…

Hardware Architecture · Computer Science 2021-12-16 Puya Amiri , Arsène Pérard-Gayot , Richard Membarth , Philipp Slusallek , Roland Leißa , Sebastian Hack

Dataflow architectures are growing in popularity due to their potential to mitigate the challenges posed by the memory wall inherent to the Von Neumann architecture. At the same time, high-level synthesis (HLS) has demonstrated its efficacy…

Hardware Architecture · Computer Science 2023-11-08 Hanchen Ye , Hyegang Jun , Deming Chen

In recent years, heterogeneous computing has emerged as the vital way to increase computers? performance and energy efficiency by combining diverse hardware devices, such as Graphics Processing Units (GPUs) and Field Programmable Gate…

Programming Languages · Computer Science 2020-11-02 Michail Papadimitriou , Juan Fumero , Athanasios Stratikopoulos , Foivos S. Zakkak , Christos Kotselidis

As deep neural networks develop significantly more diverse and complex, achieving high performance and efficiency on complicated DNN models faces pressing challenges. Modern DNN workloads are increasingly diverse in operation types, tensor…

Hardware Architecture · Computer Science 2026-05-25 Xingzhen Chen , Zhuoping Yang , Jinming Zhuang , Shixin Ji , Sarah Schultz , Zheng Dong , Weisong Shi , Peipei Zhou

Overlays have shown significant promise for field-programmable gate-arrays (FPGAs) as they allow for fast development cycles and remove many of the challenges of the traditional FPGA hardware design flow. However, this often comes with a…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-18 Mohamed S. Abdelfattah , David Han , Andrew Bitar , Roberto DiCecco , Shane OConnell , Nitika Shanker , Joseph Chu , Ian Prins , Joshua Fender , Andrew C. Ling , Gordon R. Chiu

In GPU-accelerated data analytics, the overhead of data transfer from CPU to GPU becomes a performance bottleneck when the data scales beyond GPU memory capacity due to the limited PCIe bandwidth. Data compression has come to rescue for…

Databases · Computer Science 2026-02-10 Gwangoo Yeo , Zhiyang Shen , Wei Cui , Matteo Interlandi , Rathijit Sen , Bailu Ding , Qi Chen , Minsoo Rhu

As a promising solution to boost the performance of distance-related algorithms (e.g., K-means and KNN), FPGA-based acceleration attracts lots of attention, but also comes with numerous challenges. In this work, we propose AccD, a…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-02 Yuke Wang , Boyuan Feng , Gushu Li , Lei Deng , Yuan Xie , Yufei Ding

Numerical simulations can help solve complex problems. Most of these algorithms are massively parallel and thus good candidates for FPGA acceleration thanks to spatial parallelism. Modern FPGA devices can leverage high-bandwidth memory…

Hardware Architecture · Computer Science 2022-11-09 Stephanie Soldavini , Karl F. A. Friebel , Mattia Tibaldi , Gerald Hempel , Jeronimo Castrillon , Christian Pilato

Adopting FPGA as an accelerator in datacenters is becoming mainstream for customized computing, but the fact that FPGAs are hard to program creates a steep learning curve for software programmers. Even with the help of high-level synthesis…

Hardware Architecture · Computer Science 2021-09-01 Atefeh Sohrabizadeh , Cody Hao Yu , Min Gao , Jason Cong

CPU-FPGA heterogeneous architectures are attracting ever-increasing attention in an attempt to advance computational capabilities and energy efficiency in today's datacenters. These architectures provide programmers with the ability to…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-09-24 Jason Cong , Peng Wei , Cody Hao Yu , Peng Zhang

Profile Guided Optimization (PGO) uses runtime profiling to direct compiler optimization decisions, effectively combining static analysis with actual execution behavior to enhance performance. Runtime profiles, collected through…

Performance · Computer Science 2025-07-23 Bingxin Liu , Yinghui Huang , Jianhua Gao , Jianjun Shi , Yongpeng Liu , Yipin Sun , Weixing Ji

While embedded FPGAs are attractive platforms for DNN acceleration on edge-devices due to their low latency and high energy efficiency, the scarcity of resources of edge-scale FPGA devices also makes it challenging for DNN deployment. In…

Computer Vision and Pattern Recognition · Computer Science 2019-04-10 Cong Hao , Xiaofan Zhang , Yuhong Li , Sitao Huang , Jinjun Xiong , Kyle Rupnow , Wen-mei Hwu , Deming Chen

Region proposal is critical for object detection while it usually poses a bottleneck in improving the computation efficiency on traditional control-flow architectures. We have observed region proposal tasks are potentially suitable for…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-10-30 Wenzhi Fu , Jianlei Yang , Pengcheng Dai , Yiran Chen , Weisheng Zhao

This paper proposes CodeX, an end-to-end framework that facilitates encoding, bitwidth customization, fine-tuning, and implementation of neural networks on FPGA platforms. CodeX incorporates nonlinear encoding to the computation flow of…

Machine Learning · Computer Science 2019-01-18 Mohammad Samragh , Mojan Javaheripi , Farinaz Koushanfar

In this paper, we propose TAPA, an end-to-end framework that compiles a C++ task-parallel dataflow program into a high-frequency FPGA accelerator. Compared to existing solutions, TAPA has two major advantages. First, TAPA provides a set of…

Hardware Architecture · Computer Science 2024-10-18 Licheng Guo , Yuze Chi , Jason Lau , Linghao Song , Xingyu Tian , Moazin Khatti , Weikang Qiao , Jie Wang , Ecenur Ustun , Zhenman Fang , Zhiru Zhang , Jason Cong

Quantum circuit simulation is important in the evolution of quantum software and hardware. Novel algorithms can be developed and evaluated by performing quantum circuit simulations on classical computers before physical quantum computers…

Quantum Physics · Physics 2024-10-22 Yu-Tsung Wu , Po-Hsuan Huang , Kai-Chieh Chang , Chia-Heng Tu , Shih-Hao Hung

Edge-AI applications demand high-throughput, low-latency inference on FPGAs under tight resource and power constraints. This survey provides a comprehensive review of two key architectural decisions for FPGA-based neural network…

Hardware Architecture · Computer Science 2025-06-03 Richie Li

Efficient execution of deep learning workloads on dataflow architectures is crucial for overcoming memory bottlenecks and maximizing performance. While streaming intermediate results between computation kernels can significantly improve…

Hardware Architecture · Computer Science 2025-09-24 Hanchen Ye , Deming Chen
‹ Prev 1 2 3 10 Next ›