Stream-HLS: Towards Automatic Dataflow Acceleration

Suhail Basalama; Jason Cong

doi:10.1145/3706628.3708878

Stream-HLS: Towards Automatic Dataflow Acceleration

Hardware Architecture 2025-01-17 v1

Authors: Suhail Basalama , Jason Cong

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

High-level synthesis (HLS) has enabled the rapid development of custom hardware circuits for many software applications. However, developing high-performance hardware circuits using HLS is still a non-trivial task requiring expertise in hardware design. Further, the hardware design space, especially for multi-kernel applications, grows exponentially. Therefore, several HLS automation and abstraction frameworks have been proposed recently, but many issues remain unresolved. These issues include: 1) relying mainly on hardware directives (pragmas) to apply hardware optimizations without exploring loop scheduling opportunities. 2) targeting single-kernel applications only. 3) lacking automatic and/or global design space exploration. 4) missing critical hardware optimizations, such as graph-level pipelining for multi-kernel applications. To address these challenges, we propose a novel methodology and framework on top of the popular multi-level intermediate representation (MLIR) infrastructure called Stream-HLS. Our framework takes a C/C++ or PyTorch software code and automatically generates an optimized dataflow architecture along with host code for field-programmable gate arrays (FPGAs). To achieve this, we developed an accurate analytical performance model for global scheduling and optimization of dataflow architectures. Stream-HLS is evaluated using various standard HLS benchmarks and real-world benchmarks from transformer models, convolution neural networks, and multilayer perceptrons. Stream-HLS designs outperform the designs of prior state-of-the-art automation frameworks and manually-optimized designs of abstraction frameworks by up to $79.43\times$ and $10.62\times$ geometric means respectively. Finally, the Stream-HLS framework is modularized, extensible, and open-sourced at \url{https://github.com/UCLA-VAST/Stream-HLS} (\url{https://doi.org/10.5281/zenodo.14585909}).

Keywords

high-level synthesis data stream processing large language model inference

Cite

@article{arxiv.2501.09118,
  title  = {Stream-HLS: Towards Automatic Dataflow Acceleration},
  author = {Suhail Basalama and Jason Cong},
  journal= {arXiv preprint arXiv:2501.09118},
  year   = {2025}
}

Stream-HLS: Towards Automatic Dataflow Acceleration

Abstract

Keywords

Cite

Related papers