English

A Library for Representing Python Programs as Graphs for Machine Learning

Machine Learning 2022-08-17 v1 Programming Languages Software Engineering

Abstract

Graph representations of programs are commonly a central element of machine learning for code research. We introduce an open source Python library python_graphs that applies static analysis to construct graph representations of Python programs suitable for training machine learning models. Our library admits the construction of control-flow graphs, data-flow graphs, and composite ``program graphs'' that combine control-flow, data-flow, syntactic, and lexical information about a program. We present the capabilities and limitations of the library, perform a case study applying the library to millions of competitive programming submissions, and showcase the library's utility for machine learning research.

Keywords

Cite

@article{arxiv.2208.07461,
  title  = {A Library for Representing Python Programs as Graphs for Machine Learning},
  author = {David Bieber and Kensen Shi and Petros Maniatis and Charles Sutton and Vincent Hellendoorn and Daniel Johnson and Daniel Tarlow},
  journal= {arXiv preprint arXiv:2208.07461},
  year   = {2022}
}

Comments

21 pages, 14 figures