Related papers: Concise One-Layer Transformers Can Do Function Eva…

Transformers, parallel computation, and logarithmic depth

We show that a constant number of self-attention layers can efficiently simulate, and be simulated by, a constant number of communication rounds of Massively Parallel Computation. As a consequence, we show that logarithmic depth is…

Machine Learning · Computer Science 2024-02-15 Clayton Sanford , Daniel Hsu , Matus Telgarsky

Understanding Transformer Reasoning Capabilities via Graph Algorithms

Which transformer scaling regimes are able to perfectly solve different classes of algorithmic problems? While tremendous empirical advances have been attained by transformer-based neural networks, a theoretical understanding of their…

Machine Learning · Computer Science 2024-05-30 Clayton Sanford , Bahare Fatemi , Ethan Hall , Anton Tsitsulin , Mehran Kazemi , Jonathan Halcrow , Bryan Perozzi , Vahab Mirrokni

Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights

Transformers serve as the foundational architecture for large language and video generation models, such as GPT, BERT, SORA and their successors. Empirical studies have demonstrated that real-world data and learning tasks exhibit…

Machine Learning · Computer Science 2026-05-19 Zhaiming Shen , Alex Havrilla , Rongjie Lai , Alexander Cloninger , Wenjing Liao

A Logic for Expressing Log-Precision Transformers

One way to interpret the reasoning power of transformer-based language models is to describe the types of logical rules they can resolve over some input text. Recently, Chiang et al. (2023) showed that finite-precision transformers can be…

Machine Learning · Computer Science 2025-09-12 William Merrill , Ashish Sabharwal

A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers

Recent theoretical results show transformers cannot express sequential reasoning problems over long inputs, intuitively because their computational depth is bounded. However, prior work treats the depth as a constant, leaving it unclear to…

Machine Learning · Computer Science 2025-11-07 William Merrill , Ashish Sabharwal

What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks

We study the capabilities of the transformer architecture with varying depth. Specifically, we designed a novel set of sequence learning tasks to systematically evaluate and comprehend how the depth of transformer affects its ability to…

Machine Learning · Computer Science 2024-04-03 Xingwu Chen , Difan Zou

One-layer transformers fail to solve the induction heads task

A simple communication complexity argument proves that no one-layer transformer can solve the induction heads task unless its size is exponentially larger than the size sufficient for a two-layer transformer.

Machine Learning · Computer Science 2024-08-27 Clayton Sanford , Daniel Hsu , Matus Telgarsky

Assessing Logical Reasoning Capabilities of Encoder-Only Transformer Models

Logical reasoning is central to complex human activities, such as thinking, debating, and planning; it is also a central component of many AI systems as well. In this paper, we investigate the extent to which encoder-only transformer…

Computation and Language · Computer Science 2024-07-02 Paulo Pirozelli , Marcos M. José , Paulo de Tarso P. Filho , Anarosa A. F. Brandão , Fabio G. Cozman

Transformers Can Do Arithmetic with the Right Embeddings

The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits. We mend this problem by adding an embedding to…

Machine Learning · Computer Science 2024-12-24 Sean McLeish , Arpit Bansal , Alex Stein , Neel Jain , John Kirchenbauer , Brian R. Bartoldson , Bhavya Kailkhura , Abhinav Bhatele , Jonas Geiping , Avi Schwarzschild , Tom Goldstein

Enhancing Transformers for Generalizable First-Order Logical Entailment

Transformers, as the fundamental deep learning architecture, have demonstrated great capability in reasoning. This paper studies the generalizable first-order logical reasoning ability of transformers with their parameterized knowledge and…

Computation and Language · Computer Science 2025-07-11 Tianshi Zheng , Jiazheng Wang , Zihao Wang , Jiaxin Bai , Hang Yin , Zheye Deng , Yangqiu Song , Jianxin Li

Representational Strengths and Limitations of Transformers

Attention layers, as commonly used in transformers, form the backbone of modern deep learning, yet there is no mathematical description of their benefits and deficiencies as compared with other architectures. In this work we establish both…

Machine Learning · Computer Science 2023-11-17 Clayton Sanford , Daniel Hsu , Matus Telgarsky

Algorithmic Capabilities of Random Transformers

Trained transformer models have been found to implement interpretable procedures for tasks like arithmetic and associative recall, but little is understood about how the circuits that implement these procedures originate during training. To…

Machine Learning · Computer Science 2024-10-08 Ziqian Zhong , Jacob Andreas

Transformers are Expressive, But Are They Expressive Enough for Regression?

Transformers have become pivotal in Natural Language Processing, demonstrating remarkable success in applications like Machine Translation and Summarization. Given their widespread adoption, several works have attempted to analyze the…

Machine Learning · Computer Science 2024-09-02 Swaroop Nath , Harshad Khadilkar , Pushpak Bhattacharyya

Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks

Transformer networks have seen great success in natural language processing and machine vision, where task objectives such as next word prediction and image classification benefit from nuanced context sensitivity across high-dimensional…

Machine Learning · Computer Science 2022-12-13 Yuxuan Li , James L. McClelland

A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task

Transformers demonstrate impressive performance on a range of reasoning benchmarks. To evaluate the degree to which these abilities are a result of actual reasoning, existing work has focused on developing sophisticated benchmarks for…

Machine Learning · Computer Science 2024-07-02 Jannik Brinkmann , Abhay Sheshadri , Victor Levoso , Paul Swoboda , Christian Bartelt

What Formal Languages Can Transformers Express? A Survey

As transformers have gained prominence in natural language processing, some researchers have investigated theoretically what problems they can and cannot solve, by treating problems as formal languages. Exploring such questions can help…

Machine Learning · Computer Science 2024-09-05 Lena Strobl , William Merrill , Gail Weiss , David Chiang , Dana Angluin

On the Expressive Power of Floating-Point Transformers

The study on the expressive power of transformers shows that transformers are permutation equivariant, and they can approximate all permutation-equivariant continuous functions on a compact domain. However, these results are derived under…

Machine Learning · Computer Science 2026-01-26 Sejun Park , Yeachan Park , Geonho Hwang

Theoretical limitations of multi-layer Transformer

Transformers, especially the decoder-only variants, are the backbone of most modern large language models; yet we do not have much understanding of their expressive power except for the simple $1$-layer case. Due to the difficulty of…

Machine Learning · Computer Science 2024-12-05 Lijie Chen , Binghui Peng , Hongxun Wu

Barriers to Discrete Reasoning with Transformers: A Survey Across Depth, Exactness, and Bandwidth

Transformers have become the foundational architecture for a broad spectrum of sequence modeling applications, underpinning state-of-the-art systems in natural language processing, vision, and beyond. However, their theoretical limitations…

Computation and Language · Computer Science 2026-02-13 Michelle Yuan , Weiyi Sun , Amir H. Rezaeian , Jyotika Singh , Sandip Ghoshal , Yao-Ting Wang , Miguel Ballesteros , Yassine Benajiba

The Counting Power of Transformers

Counting properties (e.g. determining whether certain tokens occur more than other tokens in a given input text) have played a significant role in the study of expressiveness of transformers. In this paper, we provide a formal framework for…

Computation and Language · Computer Science 2026-03-03 Marco Sälzer , Chris Köcher , Alexander Kozachinskiy , Georg Zetzsche , Anthony Widjaja Lin