English

Convolutional Neural Networks over Tree Structures for Programming Language Processing

Machine Learning 2015-12-09 v2 Neural and Evolutionary Computing Software Engineering

Abstract

Programming language processing (similar to natural language processing) is a hot research topic in the field of software engineering; it has also aroused growing interest in the artificial intelligence community. However, different from a natural language sentence, a program contains rich, explicit, and complicated structural information. Hence, traditional NLP models may be inappropriate for programs. In this paper, we propose a novel tree-based convolutional neural network (TBCNN) for programming language processing, in which a convolution kernel is designed over programs' abstract syntax trees to capture structural information. TBCNN is a generic architecture for programming language processing; our experiments show its effectiveness in two different program analysis tasks: classifying programs according to functionality, and detecting code snippets of certain patterns. TBCNN outperforms baseline methods, including several neural models for NLP.

Keywords

Cite

@article{arxiv.1409.5718,
  title  = {Convolutional Neural Networks over Tree Structures for Programming Language Processing},
  author = {Lili Mou and Ge Li and Lu Zhang and Tao Wang and Zhi Jin},
  journal= {arXiv preprint arXiv:1409.5718},
  year   = {2015}
}

Comments

Accepted at AAAI-16

R2 v1 2026-06-22T06:01:04.655Z