English

Training Transformers as a Universal Computer

Artificial Intelligence 2026-04-29 v1

Abstract

We demonstrate that a small transformer can learn to execute programs in MicroPy, a simplified yet computationally universal programming language. Given procedure definitions together with an expression to evaluate, the transformer predicts small-step execution using PENCIL scaffolding for space-efficient execution within a bounded context window. After training on randomly generated, meaningless MicroPy programs, the learned transformer generalizes to various human-written programs including bit copying and flipping, binary addition and multiplication, and SAT verification and solving. We note that the trained model can achieve out-of-distribution generalization; i.e., evaluate novel programs from distribution on programs. Since MicroPy can express any computation, our results provide empirical evidence that a standard transformer can be trained to act as a universal computer.

Keywords

Cite

@article{arxiv.2604.25166,
  title  = {Training Transformers as a Universal Computer},
  author = {Ruize Xu and Chenxiao Yang and Yanhong Li and David McAllester},
  journal= {arXiv preprint arXiv:2604.25166},
  year   = {2026}
}

Comments

20 pages, 9 figures