English

Modeling Programs Hierarchically with Stack-Augmented LSTM

Software Engineering 2020-02-12 v1

Abstract

Programming language modeling has attracted extensive attention in recent years, and it plays an essential role in program processing fields. Statistical language models, which are initially designed for natural languages, have been generally used for modeling programming languages. However, different from natural languages, programming languages contain explicit and hierarchical structure that is hard to learn by traditional statistical language models. To address this challenge, we propose a novel Stack-Augmented LSTM neural network for programming language modeling. Adding a stack memory component into the LSTM network enables our model to capture the hierarchical information of programs through the PUSH and POP operations, which further allows our model capturing the long-term dependency in the programs. We evaluate the proposed model on three program analysis tasks, i.e., code completion, program classification, and code summarization. Evaluation results show that our proposed model outperforms baseline models in all the three tasks, indicating that by capturing the structural information of programs with a stack, our proposed model can represent programs more precisely.

Keywords

Cite

@article{arxiv.2002.04516,
  title  = {Modeling Programs Hierarchically with Stack-Augmented LSTM},
  author = {Fang Liu and Lu Zhang and Zhi Jin},
  journal= {arXiv preprint arXiv:2002.04516},
  year   = {2020}
}

Comments

This paper contains 16 pages, 7 figures, and 9 Tables. Accepted by The Journal of Systems and Software