English

Ranking LLM-Generated Loop Invariants for Program Verification

Programming Languages 2024-02-14 v3 Artificial Intelligence Computation and Language Software Engineering

Abstract

Synthesizing inductive loop invariants is fundamental to automating program verification. In this work, we observe that Large Language Models (such as gpt-3.5 or gpt-4) are capable of synthesizing loop invariants for a class of programs in a 0-shot setting, yet require several samples to generate the correct invariants. This can lead to a large number of calls to a program verifier to establish an invariant. To address this issue, we propose a {\it re-ranking} approach for the generated results of LLMs. We have designed a ranker that can distinguish between correct inductive invariants and incorrect attempts based on the problem definition. The ranker is optimized as a contrastive ranker. Experimental results demonstrate that this re-ranking mechanism significantly improves the ranking of correct invariants among the generated candidates, leading to a notable reduction in the number of calls to a verifier. The source code and the experimental data for this paper are available in \url{https://github.com/microsoft/NeuralInvariantRanker}.

Keywords

Cite

@article{arxiv.2310.09342,
  title  = {Ranking LLM-Generated Loop Invariants for Program Verification},
  author = {Saikat Chakraborty and Shuvendu K. Lahiri and Sarah Fakhoury and Madanlal Musuvathi and Akash Lal and Aseem Rastogi and Aditya Senthilnathan and Rahul Sharma and Nikhil Swamy},
  journal= {arXiv preprint arXiv:2310.09342},
  year   = {2024}
}

Comments

Findings of The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP-findings 2023)

R2 v1 2026-06-28T12:50:16.410Z