Regression Language Models for Code

Yash Akhauri; Xingyou Song; Arissa Wongpanich; Bryan Lewandowski; Mohamed S. Abdelfattah

Regression Language Models for Code

Computation and Language 2026-05-28 v2 Artificial Intelligence Machine Learning Performance Software Engineering

Authors: Yash Akhauri , Xingyou Song , Arissa Wongpanich , Bryan Lewandowski , Mohamed S. Abdelfattah

Abstract

We study code-to-metric regression: predicting numeric outcomes of code executions, a challenging task due to the open-ended nature of programming languages. While prior methods have resorted to heavy and domain-specific feature engineering, we show that a single unified Regression Language Model (RLM) using a frozen LLM encoder can simultaneously predict directly from text, (i) the memory footprint of code across multiple high-level languages such as Python and C++, (ii) the latency of Triton GPU kernels, and (iii) the accuracy and speed of trained neural networks represented in ONNX. In particular, a relatively small 300M parameter RLM based on T5Gemma, obtains $>$ 0.9 Spearman-rank on competitive programming submissions from APPS, and a single unified model achieves $>$ 0.5 average Spearman-rank across 17 separate languages from CodeNet. Furthermore, the RLM can obtain the highest average Kendall-Tau of 0.46 on five classic NAS design spaces previously dominated by graph neural networks, and simultaneously predict architecture latencies on numerous hardware platforms.

Keywords

code generation tokenization language modeling

Cite

@article{arxiv.2509.26476,
  title  = {Regression Language Models for Code},
  author = {Yash Akhauri and Xingyou Song and Arissa Wongpanich and Bryan Lewandowski and Mohamed S. Abdelfattah},
  journal= {arXiv preprint arXiv:2509.26476},
  year   = {2026}
}

Comments

Published in International Conference on Machine Learning (ICML) 2026

Regression Language Models for Code

Abstract

Keywords

Cite

Comments

Related papers