Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization

Yangjie Zhou; Honglin Zhu; Qian Qiu; Weihao Cui; Zihan Liu; Cong Guo; Siyuan Feng; Jintao Meng; Haidong Lan; Jingwen Leng; Wenxi Zhu; Minwen Deng

Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization

Distributed, Parallel, and Cluster Computing 2024-09-04 v1

Authors: Yangjie Zhou , Honglin Zhu , Qian Qiu , Weihao Cui , Zihan Liu , Cong Guo , Siyuan Feng , Jintao Meng , Haidong Lan , Jingwen Leng , Wenxi Zhu , Minwen Deng

View on arXiv ↗ PDF ↗

Abstract

Dynamic-shape deep neural networks (DNNs) are rapidly evolving, attracting attention for their ability to handle variable input sizes in real-time applications. However, existing compilation optimization methods for such networks often rely heavily on predefined samples to guide the compilation process, which restricts their adaptability and efficiency. These sample-driven methods struggle to efficiently manage the diverse and unpredictable shapes encountered in real-world scenarios, often resulting in suboptimal performance. To tackle these issues, we introduce Vortex, a hardware-driven and sample-free compiler tailored for dynamic-shape tensor programs. Vortex capitalizes on detailed hardware information and hierarchizes the strategy space to facilitate high-performance code generation without relying on runtime shape samples. It features a unique bidirectional compilation workflow, combining top-down abstraction for aligning tensor program execution with hardware hierarchies and bottom-up kernel construction to narrow the search space, enabling Vortex to achieve remarkable efficiency. Comprehensive evaluations confirm that Vortex reduces compilation time by $176\times$ compared to the existing dynamic-shape compiler. Additionally, it substantially outperforms existing vendor-provided libraries and dynamic-shape compilers on both CPU and GPU platforms, delivering speedups of $2.53\times$ and $3.01\times$ , respectively.

Keywords

deep neural network acceleration deep neural network compiler optimization

Cite

@article{arxiv.2409.01075,
  title  = {Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization},
  author = {Yangjie Zhou and Honglin Zhu and Qian Qiu and Weihao Cui and Zihan Liu and Cong Guo and Siyuan Feng and Jintao Meng and Haidong Lan and Jingwen Leng and Wenxi Zhu and Minwen Deng},
  journal= {arXiv preprint arXiv:2409.01075},
  year   = {2024}
}

Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization

Abstract

Keywords

Cite

Related papers