English

CODO: An Automated Compiler for Comprehensive Dataflow Optimization

Hardware Architecture 2026-04-15 v1

Abstract

FPGAs are well-suited for dataflow architectures that process data in a streaming or pipelined manner, thus satisfying the high computational and communication demands of emerging applications. However, manually implementing an efficient dataflow architecture for large-scale applications is still challenging, even for specialists who use high-level synthesis (HLS) to simplify FPGA programming. To address this, we introduce CODO, an automated compiler that generates feasible and efficient dataflow accelerators on FPGAs. CODO features a systematic method for detecting and eliminating both coarse-grained and fine-grained dataflow violations. Building on this, CODO performs both on- and off-chip data movement optimizations to maximize transfer efficiency. To guarantee a higher design quality, CODO performs automatic scheduling to generate high-performance dataflow accelerators, ensuring a balanced performance-resource trade-off. Synthesis results show that CODO delivers 1.45×1.45\times to 4.52×4.52\times latency speedups on typical computation kernels and 3.7×3.7\times to 33.8×33.8\times speedups on DNN models compared to SOTA frameworks. In on-board evaluations, CODO achieves 7.3×7.3\times average speedup on CNN models and 2.07×2.07\times average speedup on the GPT-2 model over SOTA frameworks. The compiler is open-sourced at https://github.com/sjtu-zhao-lab/codo-artifact.

Keywords

Cite

@article{arxiv.2604.12618,
  title  = {CODO: An Automated Compiler for Comprehensive Dataflow Optimization},
  author = {Weichuang Zhang and Yiquan Wang and Xinzhou Zhang and Chi Zhang and Yu Feng and Xiaofeng Hou and Chao Li and Jieru Zhao and Minyi Guo},
  journal= {arXiv preprint arXiv:2604.12618},
  year   = {2026}
}

Comments

Accepted by ISCA 2026