English

Modeling Code: Is Text All You Need?

Artificial Intelligence 2025-07-16 v1 Software Engineering

Abstract

Code LLMs have become extremely popular recently for modeling source code across a variety of tasks, such as generation, translation, and summarization. However, transformer-based models are limited in their capabilities to reason through structured, analytical properties of code, such as control and data flow. Previous work has explored the modeling of these properties with structured data and graph neural networks. However, these approaches lack the generative capabilities and scale of modern LLMs. In this work, we introduce a novel approach to combine the strengths of modeling both code as text and more structured forms.

Keywords

Cite

@article{arxiv.2507.11467,
  title  = {Modeling Code: Is Text All You Need?},
  author = {Daniel Nichols and Konstantinos Parasyris and Harshitha Menon and Brian R. Bartoldson and Giorgis Georgakoudis and Tal Ben-Nun and Abhinav Bhatele},
  journal= {arXiv preprint arXiv:2507.11467},
  year   = {2025}
}