Mirage Persistent Kernel: A Compiler and Runtime for Mega-Kernelizing Tensor Programs

Xinhao Cheng; Zhihao Zhang; Yu Zhou; Jianan Ji; Jinchen Jiang; Zepeng Zhao; Ziruo Xiao; Zihao Ye; Yingyi Huang; Ruihang Lai; Hongyi Jin; Bohan Hou; Mengdi Wu; Yixin Dong; Anthony Yip; Zihao Ye; Songting Wang; Wenqin Yang; Xupeng Miao; Tianqi Chen; Zhihao Jia

Mirage Persistent Kernel: A Compiler and Runtime for Mega-Kernelizing Tensor Programs

Distributed, Parallel, and Cluster Computing 2025-12-30 v1 Machine Learning Programming Languages

Authors: Xinhao Cheng , Zhihao Zhang , Yu Zhou , Jianan Ji , Jinchen Jiang , Zepeng Zhao , Ziruo Xiao , Zihao Ye , Yingyi Huang , Ruihang Lai , Hongyi Jin , Bohan Hou , Mengdi Wu , Yixin Dong , Anthony Yip , Zihao Ye , Songting Wang , Wenqin Yang , Xupeng Miao , Tianqi Chen , Zhihao Jia

View on arXiv ↗ PDF ↗

Abstract

We introduce Mirage Persistent Kernel (MPK), the first compiler and runtime system that automatically transforms multi-GPU model inference into a single high-performance megakernel. MPK introduces an SM-level graph representation that captures data dependencies at the granularity of individual streaming multiprocessors (SMs), enabling cross-operator software pipelining, fine-grained kernel overlap, and other previously infeasible GPU optimizations. The MPK compiler lowers tensor programs into highly optimized SM-level task graphs and generates optimized CUDA implementations for all tasks, while the MPK in-kernel parallel runtime executes these tasks within a single mega-kernel using decentralized scheduling across SMs. Together, these components provide end-to-end kernel fusion with minimal developer effort, while preserving the flexibility of existing programming models. Our evaluation shows that MPK significantly outperforms existing kernel-per-operator LLM serving systems by reducing end-to-end inference latency by up to 1.7x, pushing LLM inference performance close to hardware limits. MPK is publicly available at https://github.com/mirage-project/mirage.

Keywords

compiler optimization computer architecture

Cite

@article{arxiv.2512.22219,
  title  = {Mirage Persistent Kernel: A Compiler and Runtime for Mega-Kernelizing Tensor Programs},
  author = {Xinhao Cheng and Zhihao Zhang and Yu Zhou and Jianan Ji and Jinchen Jiang and Zepeng Zhao and Ziruo Xiao and Zihao Ye and Yingyi Huang and Ruihang Lai and Hongyi Jin and Bohan Hou and Mengdi Wu and Yixin Dong and Anthony Yip and Zihao Ye and Songting Wang and Wenqin Yang and Xupeng Miao and Tianqi Chen and Zhihao Jia},
  journal= {arXiv preprint arXiv:2512.22219},
  year   = {2025}
}

Mirage Persistent Kernel: A Compiler and Runtime for Mega-Kernelizing Tensor Programs

Abstract

Keywords

Cite

Related papers