English

Retrieval-Based Neural Code Generation

Computation and Language 2018-08-31 v1

Abstract

In models to generate program source code from natural language, representing this code in a tree structure has been a common approach. However, existing methods often fail to generate complex code correctly due to a lack of ability to memorize large and complex structures. We introduce ReCode, a method based on subtree retrieval that makes it possible to explicitly reference existing code examples within a neural code generation model. First, we retrieve sentences that are similar to input sentences using a dynamic-programming-based sentence similarity scoring method. Next, we extract n-grams of action sequences that build the associated abstract syntax tree. Finally, we increase the probability of actions that cause the retrieved n-gram action subtree to be in the predicted code. We show that our approach improves the performance on two code generation tasks by up to +2.6 BLEU.

Keywords

Cite

@article{arxiv.1808.10025,
  title  = {Retrieval-Based Neural Code Generation},
  author = {Shirley Anugrah Hayati and Raphael Olivier and Pravalika Avvaru and Pengcheng Yin and Anthony Tomasic and Graham Neubig},
  journal= {arXiv preprint arXiv:1808.10025},
  year   = {2018}
}

Comments

This paper is accepted in EMNLP 2018. It has 6 pages