English
Related papers

Related papers: Transformers, parallel computation, and logarithmi…

200 papers

Despite their omnipresence in modern NLP, characterizing the computational power of transformer neural nets remains an interesting open question. We prove that transformers whose arithmetic precision is logarithmic in the number of input…

Computational Complexity · Computer Science 2023-04-28 William Merrill , Ashish Sabharwal

We study the capabilities of the transformer architecture with varying depth. Specifically, we designed a novel set of sequence learning tasks to systematically evaluate and comprehend how the depth of transformer affects its ability to…

Machine Learning · Computer Science 2024-04-03 Xingwu Chen , Difan Zou

Which transformer scaling regimes are able to perfectly solve different classes of algorithmic problems? While tremendous empirical advances have been attained by transformer-based neural networks, a theoretical understanding of their…

While transformers have proven enormously successful in a range of tasks, their fundamental properties as models of computation are not well understood. This paper contributes to the study of the expressive capacity of transformers,…

Machine Learning · Computer Science 2025-03-31 Lena Strobl , Dana Angluin , Robert Frank

Transformers have reshaped machine learning by utilizing attention mechanisms to capture complex patterns in large datasets, leading to significant improvements in performance. This success has contributed to the belief that "bigger means…

Machine Learning · Computer Science 2025-05-28 Hemanth Saratchandran , Damien Teney , Simon Lucey

Recent research has demonstrated that transformers, particularly linear attention models, implicitly execute gradient-descent-like algorithms on data provided in-context during their forward inference step. However, their capability in…

Machine Learning · Computer Science 2024-10-31 Max Vladymyrov , Johannes von Oswald , Mark Sandler , Rong Ge

Transformers process tokens in parallel but are temporally shallow: at position $t$, each layer attends to key-value pairs computed based on the previous layer, yielding a depth capped by the number of layers. Recurrent models offer…

Machine Learning · Computer Science 2026-04-24 Costin-Andrei Oncescu , Depen Morwani , Samy Jelassi , Alexandru Meterez , Mujin Kwun , Sham Kakade

Attention layers, as commonly used in transformers, form the backbone of modern deep learning, yet there is no mathematical description of their benefits and deficiencies as compared with other architectures. In this work we establish both…

Machine Learning · Computer Science 2023-11-17 Clayton Sanford , Daniel Hsu , Matus Telgarsky

Transformers have the representational capacity to simulate Massively Parallel Computation (MPC) algorithms, but they suffer from quadratic time complexity, which severely limits their scalability. We introduce an efficient attention…

Machine Learning · Computer Science 2025-09-12 Jingwen Liu , Hantao Yu , Clayton Sanford , Alexandr Andoni , Daniel Hsu

We present a novel automated technique for parallelizing quantum circuits via forward and backward translation to measurement-based quantum computing patterns and analyze the trade off in terms of depth and space complexity. As a result we…

Quantum Physics · Physics 2012-02-22 Anne Broadbent , Elham Kashefi

What is the computational model behind a Transformer? Where recurrent neural networks have direct parallels in finite state machines, allowing clear discussion and thought around architecture variants or trained models, Transformers have no…

Machine Learning · Computer Science 2021-07-20 Gail Weiss , Yoav Goldberg , Eran Yahav

We exhibit some simple gadgets useful in designing shallow parallel circuits for quantum algorithms. We prove that any quantum circuit composed entirely of controlled-not gates or of diagonal gates can be parallelized to logarithmic depth,…

Quantum Physics · Physics 2009-09-25 Cristopher Moore , Martin Nilsson

Both biological cortico-thalamic networks and artificial transformer networks use canonical computations to perform a wide range of cognitive tasks. In this work, we propose that the structure of cortico-thalamic circuits is well suited to…

Neurons and Cognition · Quantitative Biology 2025-08-12 Arno Granier , Walter Senn

Transformers have achieved remarkable success in sequence modeling and beyond but suffer from quadratic computational and memory complexities with respect to the length of the input sequence. Leveraging techniques include sparse and linear…

Machine Learning · Computer Science 2022-08-02 Tan Nguyen , Richard G. Baraniuk , Robert M. Kirby , Stanley J. Osher , Bao Wang

We present a framework for using transformer networks as universal computers by programming them with specific weights and placing them in a loop. Our input sequence acts as a punchcard, consisting of instructions and memory for data…

Machine Learning · Computer Science 2023-01-31 Angeliki Giannou , Shashank Rajput , Jy-yong Sohn , Kangwook Lee , Jason D. Lee , Dimitris Papailiopoulos

Standard Transformers have a fixed computational depth, fundamentally limiting their ability to generalize to tasks requiring variable-depth reasoning, such as multi-hop graph traversal or nested logic. We propose a depth-recurrent…

Machine Learning · Computer Science 2026-03-24 Hung-Hsuan Chen

Language models with recurrent depth, also referred to as universal or looped when considering transformers, are defined by the capacity to increase their computation through the repetition of layers. Recent efforts in pretraining have…

Machine Learning · Computer Science 2025-10-17 Jonas Geiping , Xinyu Yang , Guinan Su

Deep learning models trained on large data sets have been widely successful in both vision and language domains. As state-of-the-art deep learning architectures have continued to grow in parameter count so have the compute budgets and times…

Transformer based models have shown remarkable capabilities in sequence learning across a wide range of tasks, often performing well on specific task by leveraging input-output examples. Despite their empirical success, a comprehensive…

Machine Learning · Computer Science 2025-06-03 Yifan Hao , Chenlu Ye , Chi Han , Tong Zhang

This paper investigates approximation-theoretic aspects of the in-context learning capability of the transformers in representing a family of noisy linear dynamical systems. Our first theoretical result establishes an upper bound on the…

Machine Learning · Computer Science 2025-10-22 Frank Cole , Yuxuan Zhao , Yulong Lu , Tianhao Zhang
‹ Prev 1 2 3 10 Next ›