Modeling Code: Is Text All You Need?

Daniel Nichols; Konstantinos Parasyris; Harshitha Menon; Brian R. Bartoldson; Giorgis Georgakoudis; Tal Ben-Nun; Abhinav Bhatele

Modeling Code: Is Text All You Need?

Artificial Intelligence 2025-07-16 v1 Software Engineering

Authors: Daniel Nichols , Konstantinos Parasyris , Harshitha Menon , Brian R. Bartoldson , Giorgis Georgakoudis , Tal Ben-Nun , Abhinav Bhatele

View on arXiv ↗ PDF ↗

Abstract

Code LLMs have become extremely popular recently for modeling source code across a variety of tasks, such as generation, translation, and summarization. However, transformer-based models are limited in their capabilities to reason through structured, analytical properties of code, such as control and data flow. Previous work has explored the modeling of these properties with structured data and graph neural networks. However, these approaches lack the generative capabilities and scale of modern LLMs. In this work, we introduce a novel approach to combine the strengths of modeling both code as text and more structured forms.

Keywords

code generation programming languages text generation

Cite

@article{arxiv.2507.11467,
  title  = {Modeling Code: Is Text All You Need?},
  author = {Daniel Nichols and Konstantinos Parasyris and Harshitha Menon and Brian R. Bartoldson and Giorgis Georgakoudis and Tal Ben-Nun and Abhinav Bhatele},
  journal= {arXiv preprint arXiv:2507.11467},
  year   = {2025}
}

Modeling Code: Is Text All You Need?

Abstract

Keywords

Cite

Related papers