Related papers: The Next 700 Program Transformers
Determining whether a program terminates is a core challenge in program analysis with direct implications for correctness, verification, and security. We investigate whether transformer architectures can recognise termination patterns…
We find a series of topological phase transitions of increasing order, beyond the more standard second-order phase transition in a one-dimensional topological superconductor. The jumps in the order of the transitions depend on the range of…
The problem of determining whether or not any program terminates was shown to be undecidable by Turing, but recent advances in the area have allowed this information to be determined for a large class of programs. The classic method for…
Deep learning models such as the Transformer are often constructed by heuristics and experience. To provide a complementary foundation, in this work we study the following problem: Is it possible to find an energy function underlying the…
Understanding the inner workings of machine learning models like Transformers is vital for their safe and ethical use. This paper provides a comprehensive analysis of a one-layer Transformer model trained to perform n-digit integer…
We study how we can leverage only a handful of characteristics of a transformer's architecture to closely predict the number of different sequences it can output, both qualitatively and quantitatively. We provide an upper bound depending on…
Transformer becomes the state-of-the-art translation model, while it is not well studied how each intermediate component contributes to the model performance, which poses significant challenges for designing optimal architectures. In this…
Transformer-based large language models have achieved remarkable performance across various natural language processing tasks. However, they often struggle with seemingly easy tasks like arithmetic despite their vast capabilities. This…
Understanding the inner workings of Transformers is crucial for achieving more accurate and efficient predictions. In this work, we analyze the computation performed by Transformers in the layers after the top-1 prediction has become fixed,…
We answer an open question in the theory of transducer degrees initially posed in [1] on the existence of a diamond structure in the transducer hierarchy. Transducer degrees are the equivalence classes formed by word transformations which…
We answer an open question in the theory of transducer degrees on the existence of a diamond structure in the transducer hierarchy. Transducer degrees are the equivalence classes formed by word transformations which can be realized by a…
The growing capabilities of transformer models pave the way for solving increasingly complex NLP tasks. A key to supporting application-specific requirements is the ability to fine-tune. However, compiling a fine-tuning dataset tailored to…
In this article, we study classes of multidimensional subshifts defined by multihead finite automata, in particular the hierarchy of classes of subshifts defined as the number of heads grows. The hierarchy collapses on the third level,…
Algorithmic skeletons are used as building-blocks to ease the task of parallel programming by abstracting the details of parallel implementation from the developer. Most existing libraries provide implementations of skeletons that are…
State of the art sequence-to-sequence models for large scale tasks perform a fixed number of computations for each input sequence regardless of whether it is easy or hard to process. In this paper, we train Transformer models which can make…
Transformer networks have seen great success in natural language processing and machine vision, where task objectives such as next word prediction and image classification benefit from nuanced context sensitivity across high-dimensional…
Transformers demonstrate impressive performance on a range of reasoning benchmarks. To evaluate the degree to which these abilities are a result of actual reasoning, existing work has focused on developing sophisticated benchmarks for…
We prove that for a given deterministic top-down transducer with look-ahead it is decidable whether or not its translation is definable (1)~by a linear top-down tree transducer or (2)~by a tree homomorphism. We present algorithms that…
Transformers have demonstrated remarkable performance in natural language processing and related domains, as they largely focus on sequential, autoregressive next-token prediction tasks. Yet, they struggle in logical reasoning, not…
Generative AI, in particular large transformer models, are increasingly driving HPC system design in science and industry. We analyze performance characteristics of such transformer models and discuss their sensitivity to the transformer…