Related papers: Efficient Turing Machine Simulation with Transform…
We prove that any Turing machine running on inputs of arbitrary length can be simulated by a constant bit-size transformer, as long as the context window is sufficiently long. This improves previous works, which require scaling up either…
The quadratic complexity of self-attention prevents transformers from scaling effectively to long input sequences. On the other hand, modern GPUs and other specialized hardware accelerators are well-optimized for processing small input…
Chain-of-Thought (CoT) has been shown to empirically improve Transformers' performance, and theoretically increase their expressivity to Turing completeness. However, whether Transformers can learn to generalize to CoT traces longer than…
Instructing the model to generate a sequence of intermediate steps, a.k.a., a chain of thought (CoT), is a highly effective method to improve the accuracy of large language models (LLMs) on arithmetics and symbolic reasoning tasks. However,…
We show that for all functions $t(n) \geq n$, every multitape Turing machine running in time $t$ can be simulated in space only $O(\sqrt{t \log t})$. This is a substantial improvement over Hopcroft, Paul, and Valiant's simulation of time…
We consider computations of a Turing machine subjected to noise. In every step, the action (the new state and the new content of the observed cell, the direction of the head movement) can differ from that prescribed by the transition…
We propose Token Turing Machines (TTM), a sequential, autoregressive Transformer model with memory for real-world sequential visual understanding. Our model is inspired by the seminal Neural Turing Machine, and has an external memory…
Algorithmic reasoning requires capabilities which are most naturally understood through recurrent models of computation, like the Turing machine. However, Transformer models, while lacking recurrence, are able to perform such reasoning…
As was well known, in classical computation, Turing machines, circuits, multi-stack machines, and multi-counter machines are equivalent, that is, they can simulate each other in polynomial time. In quantum computation, Yao [11] first proved…
Existing expressivity results for transformers typically rely on hardmax attention, high precision, and other architectural modifications that disconnect them from the models used in practice. We bridge this gap by analyzing standard…
We show that, for all reasonable functions $T(n)=o(n\log n)$, we can algorithmically verify whether a given one-tape Turing machine runs in time at most $T(n)$. This is a tight bound on the order of growth for the function $T$ because we…
Yao (1993) proved that quantum Turing machines and uniformly generated quantum circuits are polynomially equivalent computational models: $t \geq n$ steps of a quantum Turing machine running on an input of length $n$ can be simulated by a…
Williams (STOC 2025) recently proved that time-$t$ multitape Turing machines can be simulated using $O(\sqrt{t \log t})$ space using the Cook-Mertz (STOC 2024) tree evaluation procedure. As Williams notes, applying this result to fast…
By considering a discrete tape where each cell corresponds to an integer, thus to a possible sum, a pseudo-polynomial solution can be given to subset sum problem, which is an NP-complete problem and a cornerstone application for this study,…
Multiway Turing machines (also known as nondeterministic Turing machines or NDTMs) with explicit, simple rules are studied. Even very simple rules are found to generate complex behavior, characterized by complex multiway graphs, that can be…
We introduce a new type of generalized Turing machines (GTMs), which are intended as a tool for the mathematician who studies computability in Analysis. In a single tape cell a GTM can store a symbol, a real number, a continuous real…
Chain of Thought (CoT) prompting has been shown to significantly improve the performance of large language models (LLMs), particularly in arithmetic and reasoning tasks, by instructing the model to produce intermediate reasoning steps.…
Device sizing is crucial for meeting performance specifications in operational transconductance amplifiers (OTAs), and this work proposes an automated sizing framework based on a transformer model. The approach first leverages the…
Hamiltonian simulation on quantum computers is strongly constrained by gate counts, motivating techniques to reduce circuit depths. While tensor networks are natural competitors to quantum computers, we instead leverage them to support…
Recent works attribute the capability of in-context learning (ICL) in large pre-trained language models to implicitly simulating and fine-tuning an internal model (e.g., linear or 2-layer MLP) during inference. However, such constructions…