Related papers: The Next 700 Program Transformers

Transformers for Program Termination

Determining whether a program terminates is a core challenge in program analysis with direct implications for correctness, verification, and security. We investigate whether transformer architectures can recognise termination patterns…

Programming Languages · Computer Science 2026-04-02 Yoav Alon , Cristina David

Staircase to Higher-Order Topological Phase Transitions

We find a series of topological phase transitions of increasing order, beyond the more standard second-order phase transition in a one-dimensional topological superconductor. The jumps in the order of the transitions depend on the range of…

Statistical Mechanics · Physics 2018-04-04 P. Cats , A. Quelle , O. Viyuela , M. A. Martin-Delgado , C. Morais Smith

Distilling Programs to Prove Termination

The problem of determining whether or not any program terminates was shown to be undecidable by Turing, but recent advances in the area have allowed this information to be determined for a large class of programs. The classic method for…

Logic in Computer Science · Computer Science 2020-08-10 G. W. Hamilton

Transformers from an Optimization Perspective

Deep learning models such as the Transformer are often constructed by heuristics and experience. To provide a complementary foundation, in this work we study the following problem: Is it possible to find an energy function underlying the…

Machine Learning · Computer Science 2023-02-28 Yongyi Yang , Zengfeng Huang , David Wipf

Understanding Addition in Transformers

Understanding the inner workings of machine learning models like Transformers is vital for their safe and ethical use. This paper provides a comprehensive analysis of a one-layer Transformer model trained to perform n-digit integer…

Machine Learning · Computer Science 2024-04-25 Philip Quirke , Fazl Barez

How Many Different Outputs Can a Transformer Generate?

We study how we can leverage only a handful of characteristics of a transformer's architecture to closely predict the number of different sequences it can output, both qualitatively and quantitatively. We provide an upper bound depending on…

Machine Learning · Computer Science 2026-05-22 Maxime Meyer , Mario Michelessa , Caroline Chaux , Vincent Y. F. Tan

Rethinking the Value of Transformer Components

Transformer becomes the state-of-the-art translation model, while it is not well studied how each intermediate component contributes to the model performance, which poses significant challenges for designing optimal architectures. In this…

Computation and Language · Computer Science 2020-11-10 Wenxuan Wang , Zhaopeng Tu

Dissecting Multiplication in Transformers: Insights into LLMs

Transformer-based large language models have achieved remarkable performance across various natural language processing tasks. However, they often struggle with seemingly easy tasks like arithmetic despite their vast capabilities. This…

Computation and Language · Computer Science 2024-07-23 Luyu Qiu , Jianing Li , Chi Su , Chen Jason Zhang , Lei Chen

Looking Beyond The Top-1: Transformers Determine Top Tokens In Order

Understanding the inner workings of Transformers is crucial for achieving more accurate and efficient predictions. In this work, we analyze the computation performed by Transformers in the layers after the top-1 prediction has become fixed,…

Computation and Language · Computer Science 2024-10-29 Daria Lioubashevski , Tomer Schlank , Gabriel Stanovsky , Ariel Goldstein

A Diamond Structure in the Transducer Hierarchy

We answer an open question in the theory of transducer degrees initially posed in [1] on the existence of a diamond structure in the transducer hierarchy. Transducer degrees are the equivalence classes formed by word transformations which…

Formal Languages and Automata Theory · Computer Science 2023-01-18 Noah Kaufmann

A Diamond Structure in the Transducer Hierarchy

We answer an open question in the theory of transducer degrees on the existence of a diamond structure in the transducer hierarchy. Transducer degrees are the equivalence classes formed by word transformations which can be realized by a…

Formal Languages and Automata Theory · Computer Science 2025-12-17 Noah Kaufmann

Divide et Impera: Multi-Transformer Architectures for Complex NLP-Tasks

The growing capabilities of transformer models pave the way for solving increasingly complex NLP tasks. A key to supporting application-specific requirements is the ability to fine-tune. However, compiling a fine-tuning dataset tailored to…

Computation and Language · Computer Science 2024-02-13 Solveig Helland , Elena Gavagnin , Alexandre de Spindler

Plane-Walking Automata

In this article, we study classes of multidimensional subshifts defined by multihead finite automata, in particular the hierarchy of classes of subshifts defined as the number of heads grows. The hierarchy collapses on the third level,…

Formal Languages and Automata Theory · Computer Science 2014-08-29 Ville Salo , Ilkka Törmä

Program Transformation to Identify List-Based Parallel Skeletons

Algorithmic skeletons are used as building-blocks to ease the task of parallel programming by abstracting the details of parallel implementation from the developer. Most existing libraries provide implementations of skeletons that are…

Programming Languages · Computer Science 2016-07-11 Venkatesh Kannan , G. W. Hamilton

Depth-Adaptive Transformer

State of the art sequence-to-sequence models for large scale tasks perform a fixed number of computations for each input sequence regardless of whether it is easy or hard to process. In this paper, we train Transformer models which can make…

Computation and Language · Computer Science 2020-02-18 Maha Elbayad , Jiatao Gu , Edouard Grave , Michael Auli

Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks

Transformer networks have seen great success in natural language processing and machine vision, where task objectives such as next word prediction and image classification benefit from nuanced context sensitivity across high-dimensional…

Machine Learning · Computer Science 2022-12-13 Yuxuan Li , James L. McClelland

A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task

Transformers demonstrate impressive performance on a range of reasoning benchmarks. To evaluate the degree to which these abilities are a result of actual reasoning, existing work has focused on developing sophisticated benchmarks for…

Machine Learning · Computer Science 2024-07-02 Jannik Brinkmann , Abhay Sheshadri , Victor Levoso , Paul Swoboda , Christian Bartelt

Definability Results for Top-Down Tree Transducers

We prove that for a given deterministic top-down transducer with look-ahead it is decidable whether or not its translation is definable (1)~by a linear top-down tree transducer or (2)~by a tree homomorphism. We present algorithms that…

Formal Languages and Automata Theory · Computer Science 2021-06-01 Sebastian Maneth , Helmut Seidl , Martin Vu

Hierarchical Reasoning Models: Perspectives and Misconceptions

Transformers have demonstrated remarkable performance in natural language processing and related domains, as they largely focus on sequential, autoregressive next-token prediction tasks. Yet, they struggle in logical reasoning, not…

Artificial Intelligence · Computer Science 2025-10-08 Renee Ge , Qianli Liao , Tomaso Poggio

Comprehensive Performance Modeling and System Design Insights for Foundation Models

Generative AI, in particular large transformer models, are increasingly driving HPC system design in science and industry. We analyze performance characteristics of such transformer models and discuss their sensitivity to the transformer…

Machine Learning · Computer Science 2024-10-02 Shashank Subramanian , Ermal Rrapaj , Peter Harrington , Smeet Chheda , Steven Farrell , Brian Austin , Samuel Williams , Nicholas Wright , Wahid Bhimji