English

DOCE: Finding the Sweet Spot for Execution-Based Code Generation

Computation and Language 2024-10-17 v4 Artificial Intelligence Programming Languages

Abstract

Recently, a diverse set of decoding and reranking procedures have been shown effective for LLM-based code generation. However, a comprehensive framework that links and experimentally compares these methods is missing. We address this by proposing Decoding Objectives for Code Execution, a comprehensive framework that includes candidate generation, nn-best reranking, minimum Bayes risk (MBR) decoding, and self-debugging as the core components. We then study the contributions of these components through execution-based evaluation metrics. Our findings highlight the importance of execution-based methods and the difference gap between execution-based and execution-free methods. Furthermore, we assess the impact of filtering based on trial unit tests, a simple and effective strategy that has been often overlooked in prior works. We also propose self-debugging on multiple candidates, obtaining state-of-the-art performance on reranking for code generation. We expect our framework to provide a solid guideline for future research on code generation.

Keywords

Cite

@article{arxiv.2408.13745,
  title  = {DOCE: Finding the Sweet Spot for Execution-Based Code Generation},
  author = {Haau-Sing Li and Patrick Fernandes and Iryna Gurevych and André F. T. Martins},
  journal= {arXiv preprint arXiv:2408.13745},
  year   = {2024}
}

Comments

10 pages (32 including appendix), 5 figures, 25 tables. Prompts are provided in the GitHub repository to avoid potential text overlap with other papers