Related papers: Training Transformers as a Universal Computer

Looped Transformers as Programmable Computers

We present a framework for using transformer networks as universal computers by programming them with specific weights and placing them in a loop. Our input sequence acts as a punchcard, consisting of instructions and memory for data…

Machine Learning · Computer Science 2023-01-31 Angeliki Giannou , Shashank Rajput , Jy-yong Sohn , Kangwook Lee , Jason D. Lee , Dimitris Papailiopoulos

Discovering Interpretable Algorithms by Decompiling Transformers to RASP

Recent work has shown that the computations of Transformers can be simulated in the RASP family of programming languages. These findings have enabled improved understanding of the expressive capacity and generalization abilities of…

Machine Learning · Computer Science 2026-02-10 Xinting Huang , Aleksandra Bakalova , Satwik Bhattamishra , William Merrill , Michael Hahn

Learning Transformer Programs

Recent research in mechanistic interpretability has attempted to reverse-engineer Transformer models by carefully inspecting network weights and activations. However, these approaches require considerable manual effort and still fall short…

Machine Learning · Computer Science 2023-11-01 Dan Friedman , Alexander Wettig , Danqi Chen

Universal computation is intrinsic to language model decoding

Language models now provide an interface to express and often solve general problems in natural language, yet their ultimate computational capabilities remain a major topic of scientific debate. Unlike a formal computer, a language model is…

Computation and Language · Computer Science 2026-02-11 Alex Lewandowski , Marlos C. Machado , Dale Schuurmans

Transformers are Universal Predictors

We find limits to the Transformer architecture for language modeling and show it has a universal prediction property in an information-theoretic sense. We further analyze performance in non-asymptotic data regimes to understand the role of…

Machine Learning · Computer Science 2023-07-18 Sourya Basu , Moulik Choraria , Lav R. Varshney

Universal Length Generalization with Turing Programs

Length generalization refers to the ability to extrapolate from short training sequences to long test sequences and is a challenge for current large language models. While prior work has proposed some architecture or data format changes to…

Machine Learning · Computer Science 2024-07-04 Kaiying Hou , David Brandfonbrener , Sham Kakade , Samy Jelassi , Eran Malach

Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context

Transformers have the capacity to act as supervised learning algorithms: by properly encoding a set of labeled training ("in-context") examples and an unlabeled test example into an input sequence of vectors of the same dimension, the…

Machine Learning · Computer Science 2024-12-16 Spencer Frei , Gal Vardi

Pretrained Transformers as Universal Computation Engines

We investigate the capability of a transformer pretrained on natural language to generalize to other modalities with minimal finetuning -- in particular, without finetuning of the self-attention and feedforward layers of the residual…

Machine Learning · Computer Science 2021-07-01 Kevin Lu , Aditya Grover , Pieter Abbeel , Igor Mordatch

Transformers Meet In-Context Learning: A Universal Approximation Theory

Large language models are capable of in-context learning, the ability to perform new tasks at test time using a handful of input-output examples, without parameter updates. We develop a universal approximation theory to elucidate how…

Machine Learning · Computer Science 2025-08-29 Gen Li , Yuchen Jiao , Yu Huang , Yuting Wei , Yuxin Chen

Looped ReLU MLPs May Be All You Need as Practical Programmable Computers

Previous work has demonstrated that attention mechanisms are Turing complete. More recently, it has been shown that a looped 9-layer Transformer can function as a universal programmable computer. In contrast, the multi-layer perceptrons…

Machine Learning · Computer Science 2025-02-21 Yingyu Liang , Zhizhou Sha , Zhenmei Shi , Zhao Song , Yufa Zhou

Transformers are Efficient Compilers, Provably

Transformer-based large language models (LLMs) have demonstrated surprisingly robust performance across a wide range of language-related tasks, including programming language understanding and generation. In this paper, we take the first…

Programming Languages · Computer Science 2025-01-28 Xiyu Zhai , Runlong Zhou , Liao Zhang , Simon Shaolei Du

Thinking Like Transformers

What is the computational model behind a Transformer? Where recurrent neural networks have direct parallels in finite state machines, allowing clear discussion and thought around architecture variants or trained models, Transformers have no…

Machine Learning · Computer Science 2021-07-20 Gail Weiss , Yoav Goldberg , Eran Yahav

Show Your Work: Scratchpads for Intermediate Computation with Language Models

Large pre-trained language models perform remarkably well on tasks that can be done "in one pass", such as generating realistic text or synthesizing computer programs. However, they struggle with tasks that require unbounded multi-step…

Machine Learning · Computer Science 2022-01-02 Maxwell Nye , Anders Johan Andreassen , Guy Gur-Ari , Henryk Michalewski , Jacob Austin , David Bieber , David Dohan , Aitor Lewkowycz , Maarten Bosma , David Luan , Charles Sutton , Augustus Odena

An Introduction to Transformers

The transformer is a neural network component that can be used to learn useful representations of sequences or sets of data-points. The transformer has driven recent advances in natural language processing, computer vision, and…

Machine Learning · Computer Science 2026-01-21 Richard E. Turner

Tracr: Compiled Transformers as a Laboratory for Interpretability

We show how to "compile" human-readable programs into standard decoder-only transformer models. Our compiler, Tracr, generates models with known structure. This structure can be used to design experiments. For example, we use it to study…

Machine Learning · Computer Science 2023-11-06 David Lindner , János Kramár , Sebastian Farquhar , Matthew Rahtz , Thomas McGrath , Vladimir Mikulik

Symbolic Computation via Program Transformation

Symbolic computation is an important approach in automated program analysis. Most state-of-the-art tools perform symbolic computation as interpreters and directly maintain symbolic data. In this paper, we show that it is feasible, and in…

Programming Languages · Computer Science 2019-07-10 Henrich Lauko , Petr Ročkai , Jiří Barnat

Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks

Transformer networks have seen great success in natural language processing and machine vision, where task objectives such as next word prediction and image classification benefit from nuanced context sensitivity across high-dimensional…

Machine Learning · Computer Science 2022-12-13 Yuxuan Li , James L. McClelland

TransCoder: Towards Unified Transferable Code Representation Learning Inspired by Human Skills

Code pre-trained models (CodePTMs) have recently demonstrated a solid capacity to process various software intelligence tasks, e.g., code clone detection, code translation, and code summarization. The current mainstream method that deploys…

Software Engineering · Computer Science 2024-05-10 Qiushi Sun , Nuo Chen , Jianing Wang , Xiang Li , Ming Gao

Transformers are Adaptable Task Planners

Every home is different, and every person likes things done in their particular way. Therefore, home robots of the future need to both reason about the sequential nature of day-to-day tasks and generalize to user's preferences. To this end,…

Robotics · Computer Science 2022-07-07 Vidhi Jain , Yixin Lin , Eric Undersander , Yonatan Bisk , Akshara Rai

Learning Elementary Cellular Automata with Transformers

Large Language Models demonstrate remarkable mathematical capabilities but at the same time struggle with abstract reasoning and planning. In this study, we explore whether Transformers can learn to abstract and generalize the rules…

Neural and Evolutionary Computing · Computer Science 2024-12-03 Mikhail Burtsev