Exploiting Code Symmetries for Learning Program Semantics

Kexin Pei; Weichen Li; Qirui Jin; Shuyang Liu; Scott Geng; Lorenzo Cavallaro; Junfeng Yang; Suman Jana

Exploiting Code Symmetries for Learning Program Semantics

Machine Learning 2024-09-10 v9 Cryptography and Security Programming Languages

Authors: Kexin Pei , Weichen Li , Qirui Jin , Shuyang Liu , Scott Geng , Lorenzo Cavallaro , Junfeng Yang , Suman Jana

Abstract

This paper tackles the challenge of teaching code semantics to Large Language Models (LLMs) for program analysis by incorporating code symmetries into the model architecture. We introduce a group-theoretic framework that defines code symmetries as semantics-preserving transformations, where forming a code symmetry group enables precise and efficient reasoning of code semantics. Our solution, SymC, develops a novel variant of self-attention that is provably equivariant to code symmetries from the permutation group defined over the program dependence graph. SymC obtains superior performance on five program analysis tasks, outperforming state-of-the-art code models without any pre-training. Our results suggest that code LLMs that encode the code structural prior via the code symmetry group generalize better and faster.

Keywords

code generation large language model logic programming

Cite

@article{arxiv.2308.03312,
  title  = {Exploiting Code Symmetries for Learning Program Semantics},
  author = {Kexin Pei and Weichen Li and Qirui Jin and Shuyang Liu and Scott Geng and Lorenzo Cavallaro and Junfeng Yang and Suman Jana},
  journal= {arXiv preprint arXiv:2308.03312},
  year   = {2024}
}

Exploiting Code Symmetries for Learning Program Semantics

Abstract

Keywords

Cite

Related papers