English
Related papers

Related papers: Looped Transformers as Programmable Computers

200 papers

Previous work has demonstrated that attention mechanisms are Turing complete. More recently, it has been shown that a looped 9-layer Transformer can function as a universal programmable computer. In contrast, the multi-layer perceptrons…

Machine Learning · Computer Science 2025-02-21 Yingyu Liang , Zhizhou Sha , Zhenmei Shi , Zhao Song , Yufa Zhou

Previous work on the learnability of transformers \textemdash\ focused primarily on examining their ability to approximate specific algorithmic patterns through training \textemdash\ has largely been data-driven, offering only probabilistic…

Machine Learning · Computer Science 2026-04-23 Debanjan Dutta , Anish Chakrabarty , Faizanuddin Ansari , Swagatam Das

What is the computational model behind a Transformer? Where recurrent neural networks have direct parallels in finite state machines, allowing clear discussion and thought around architecture variants or trained models, Transformers have no…

Machine Learning · Computer Science 2021-07-20 Gail Weiss , Yoav Goldberg , Eran Yahav

We analyse the computational power of transformer encoders as sequence-to-sequence functions on vectors. We show that average hard attention can be used to simulate arithmetic circuits if they are given as an input to an encoder. The…

Computational Complexity · Computer Science 2026-05-07 Lena Ehrmuth , Laura Strieker

We show that a constant number of self-attention layers can efficiently simulate, and be simulated by, a constant number of communication rounds of Massively Parallel Computation. As a consequence, we show that logarithmic depth is…

Machine Learning · Computer Science 2024-02-15 Clayton Sanford , Daniel Hsu , Matus Telgarsky

We demonstrate that a small transformer can learn to execute programs in MicroPy, a simplified yet computationally universal programming language. Given procedure definitions together with an expression to evaluate, the transformer predicts…

Artificial Intelligence · Computer Science 2026-04-29 Ruize Xu , Chenxiao Yang , Yanhong Li , David McAllester

Attention layers, as commonly used in transformers, form the backbone of modern deep learning, yet there is no mathematical description of their benefits and deficiencies as compared with other architectures. In this work we establish both…

Machine Learning · Computer Science 2023-11-17 Clayton Sanford , Daniel Hsu , Matus Telgarsky

Despite the widespread adoption of Transformer models for NLP tasks, the expressive power of these models is not well-understood. In this paper, we establish that Transformer models are universal approximators of continuous permutation…

Machine Learning · Computer Science 2020-02-26 Chulhee Yun , Srinadh Bhojanapalli , Ankit Singh Rawat , Sashank J. Reddi , Sanjiv Kumar

Error correction code is a major part of the communication physical layer, ensuring the reliable transfer of data over noisy channels. Recently, neural decoders were shown to outperform classical decoding techniques. However, the existing…

Machine Learning · Computer Science 2022-03-30 Yoni Choukroun , Lior Wolf

We study the capabilities of the transformer architecture with varying depth. Specifically, we designed a novel set of sequence learning tasks to systematically evaluate and comprehend how the depth of transformer affects its ability to…

Machine Learning · Computer Science 2024-04-03 Xingwu Chen , Difan Zou

Transformers have demonstrated effectiveness in in-context solving data-fitting problems from various (latent) models, as reported by Garg et al. However, the absence of an inherent iterative structure in the transformer architecture…

Machine Learning · Computer Science 2024-03-19 Liu Yang , Kangwook Lee , Robert Nowak , Dimitris Papailiopoulos

While transformers have proven enormously successful in a range of tasks, their fundamental properties as models of computation are not well understood. This paper contributes to the study of the expressive capacity of transformers,…

Machine Learning · Computer Science 2025-03-31 Lena Strobl , Dana Angluin , Robert Frank

In order to reduce the computational complexity of large language models, great efforts have been made to to improve the efficiency of transformer models such as linear attention and flash-attention. However, the model size and…

Computation and Language · Computer Science 2026-02-04 Ning Ding , Yehui Tang , Haochen Qin , Zhenli Zhou , Chao Xu , Lin Li , Kai Han , Heng Liao , Yunhe Wang

The rapid progress seen in terms of large-scale generative AI is largely based on the attention mechanism. It is conversely non-trivial to conceive small-scale applications for which attention-based architectures outperform traditional…

Machine Learning · Computer Science 2025-08-07 Claudius Gros

This work presents an analysis of the effectiveness of using standard shallow feed-forward networks to mimic the behavior of the attention mechanism in the original Transformer model, a state-of-the-art architecture for sequence-to-sequence…

Computation and Language · Computer Science 2024-02-06 Vukasin Bozic , Danilo Dordevic , Daniele Coppola , Joseph Thommes , Sidak Pal Singh

Despite significant progress in transformer interpretability, an understanding of the computational mechanisms of large language models (LLMs) remains a fundamental challenge. Many approaches interpret a network's hidden representations but…

Machine Learning · Computer Science 2025-10-14 James R. Golden

Deep learning employs multi-layer neural networks trained via the backpropagation algorithm. This approach has achieved success across many domains and relies on adaptive gradient methods such as the Adam optimizer. Sequence modeling…

Machine Learning · Computer Science 2025-07-16 Esmail Gumaan

In this work, quantum transformers are designed and analysed in detail by extending the state-of-the-art classical transformer neural network architectures known to be very performant in natural language processing and image analysis.…

The transformer architecture is widely used in machine learning models and consists of two alternating sublayers: attention heads and MLPs. We prove that an MLP neuron can be implemented by a masked attention head with internal dimension 1…

Machine Learning · Computer Science 2023-09-18 Robert Huben , Valerie Morris

Transformer is a ubiquitous model for natural language processing and has attracted wide attentions in computer vision. The attention maps are indispensable for a transformer model to encode the dependencies among input tokens. However,…

Machine Learning · Computer Science 2021-02-26 Yujing Wang , Yaming Yang , Jiangang Bai , Mingliang Zhang , Jing Bai , Jing Yu , Ce Zhang , Gao Huang , Yunhai Tong
‹ Prev 1 2 3 10 Next ›