English

LLM-Vectorizer: LLM-based Verified Loop Vectorizer

Software Engineering 2024-06-10 v1 Artificial Intelligence Machine Learning Performance

Abstract

Vectorization is a powerful optimization technique that significantly boosts the performance of high performance computing applications operating on large data arrays. Despite decades of research on auto-vectorization, compilers frequently miss opportunities to vectorize code. On the other hand, writing vectorized code manually using compiler intrinsics is still a complex, error-prone task that demands deep knowledge of specific architecture and compilers. In this paper, we evaluate the potential of large-language models (LLMs) to generate vectorized (Single Instruction Multiple Data) code from scalar programs that process individual array elements. We propose a novel finite-state machine multi-agents based approach that harnesses LLMs and test-based feedback to generate vectorized code. Our findings indicate that LLMs are capable of producing high performance vectorized code with run-time speedup ranging from 1.1x to 9.4x as compared to the state-of-the-art compilers such as Intel Compiler, GCC, and Clang. To verify the correctness of vectorized code, we use Alive2, a leading bounded translation validation tool for LLVM IR. We describe a few domain-specific techniques to improve the scalability of Alive2 on our benchmark dataset. Overall, our approach is able to verify 38.2% of vectorizations as correct on the TSVC benchmark dataset.

Keywords

Cite

@article{arxiv.2406.04693,
  title  = {LLM-Vectorizer: LLM-based Verified Loop Vectorizer},
  author = {Jubi Taneja and Avery Laird and Cong Yan and Madan Musuvathi and Shuvendu K. Lahiri},
  journal= {arXiv preprint arXiv:2406.04693},
  year   = {2024}
}