Related papers: Carrying over algorithm in transformers

Understanding Addition in Transformers

Understanding the inner workings of machine learning models like Transformers is vital for their safe and ethical use. This paper provides a comprehensive analysis of a one-layer Transformer model trained to perform n-digit integer…

Machine Learning · Computer Science 2024-04-25 Philip Quirke , Fazl Barez

Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure

Even for simple arithmetic tasks like integer addition, it is challenging for Transformers to generalize to longer sequences than those encountered during training. To tackle this problem, we propose position coupling, a simple yet…

Machine Learning · Computer Science 2024-10-31 Hanseul Cho , Jaeyoung Cha , Pranjal Awasthi , Srinadh Bhojanapalli , Anupam Gupta , Chulhee Yun

Transformers Can Do Arithmetic with the Right Embeddings

The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits. We mend this problem by adding an embedding to…

Machine Learning · Computer Science 2024-12-24 Sean McLeish , Arpit Bansal , Alex Stein , Neel Jain , John Kirchenbauer , Brian R. Bartoldson , Bhavya Kailkhura , Abhinav Bhatele , Jonas Geiping , Avi Schwarzschild , Tom Goldstein

Dissecting Multiplication in Transformers: Insights into LLMs

Transformer-based large language models have achieved remarkable performance across various natural language processing tasks. However, they often struggle with seemingly easy tasks like arithmetic despite their vast capabilities. This…

Computation and Language · Computer Science 2024-07-23 Luyu Qiu , Jianing Li , Chi Su , Chen Jason Zhang , Lei Chen

Positional Attention: Expressivity and Learnability of Algorithmic Computation

There is a growing interest in the ability of neural networks to execute algorithmic tasks (e.g., arithmetic, summary statistics, and sorting). The goal of this work is to better understand the role of attention in Transformers for…

Machine Learning · Computer Science 2025-06-11 Artur Back de Luca , George Giapitzakis , Shenghao Yang , Petar Veličković , Kimon Fountoulakis

Transformers converge to invariant algorithmic cores

Large language models exhibit sophisticated capabilities, yet understanding how they work internally remains a central challenge. A fundamental obstacle is that training selects for behavior, not circuitry, so many weight configurations can…

Machine Learning · Computer Science 2026-02-27 Joshua S. Schiffman

Investigating the Limitations of Transformers with Simple Arithmetic Tasks

The ability to perform arithmetic tasks is a remarkable trait of human intelligence and might form a critical component of more complex reasoning tasks. In this work, we investigate if the surface form of a number has any influence on how…

Computation and Language · Computer Science 2021-04-14 Rodrigo Nogueira , Zhiying Jiang , Jimmy Lin

Algorithmic Capabilities of Random Transformers

Trained transformer models have been found to implement interpretable procedures for tasks like arithmetic and associative recall, but little is understood about how the circuits that implement these procedures originate during training. To…

Machine Learning · Computer Science 2024-10-08 Ziqian Zhong , Jacob Andreas

Positional Description Matters for Transformers Arithmetic

Transformers, central to the successes in modern Natural Language Processing, often falter on arithmetic tasks despite their vast capabilities --which paradoxically include remarkable coding abilities. We observe that a crucial challenge is…

Computation and Language · Computer Science 2023-11-28 Ruoqi Shen , Sébastien Bubeck , Ronen Eldan , Yin Tat Lee , Yuanzhi Li , Yi Zhang

Three things everyone should know about Vision Transformers

After their initial success in natural language processing, transformer architectures have rapidly gained traction in computer vision, providing state-of-the-art results for tasks such as image classification, detection, segmentation, and…

Computer Vision and Pattern Recognition · Computer Science 2022-03-21 Hugo Touvron , Matthieu Cord , Alaaeldin El-Nouby , Jakob Verbeek , Hervé Jégou

Transporter Networks: Rearranging the Visual World for Robotic Manipulation

Robotic manipulation can be formulated as inducing a sequence of spatial displacements: where the space being moved can encompass an object, part of an object, or end effector. In this work, we propose the Transporter Network, a simple…

Robotics · Computer Science 2022-01-07 Andy Zeng , Pete Florence , Jonathan Tompson , Stefan Welker , Jonathan Chien , Maria Attarian , Travis Armstrong , Ivan Krasin , Dan Duong , Ayzaan Wahid , Vikas Sindhwani , Johnny Lee

Counting in Small Transformers: The Delicate Interplay between Attention and Feed-Forward Layers

Next to scaling considerations, architectural design choices profoundly shape the solution space of transformers. In this work, we analyze the solutions simple transformer blocks implement when tackling the histogram task: counting items in…

Machine Learning · Computer Science 2025-11-13 Freya Behrens , Luca Biggio , Lenka Zdeborová

Modular Arithmetic: Language Models Solve Math Digit by Digit

While recent work has begun to uncover the internal strategies that Large Language Models (LLMs) employ for simple arithmetic tasks, a unified understanding of their underlying mechanisms is still lacking. We extend recent findings showing…

Computation and Language · Computer Science 2025-08-05 Tanja Baeumel , Daniil Gurgurov , Yusser al Ghussin , Josef van Genabith , Simon Ostermann

Small transformer architectures for task switching

The rapid progress seen in terms of large-scale generative AI is largely based on the attention mechanism. It is conversely non-trivial to conceive small-scale applications for which attention-based architectures outperform traditional…

Machine Learning · Computer Science 2025-08-07 Claudius Gros

Transformers discover an elementary calculation system exploiting local attention and grid-like problem representation

Mathematical reasoning is one of the most impressive achievements of human intellect but remains a formidable challenge for artificial intelligence systems. In this work we explore whether modern deep learning architectures can learn to…

Machine Learning · Computer Science 2022-07-07 Samuel Cognolato , Alberto Testolin

Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks

Transformer networks have seen great success in natural language processing and machine vision, where task objectives such as next word prediction and image classification benefit from nuanced context sensitivity across high-dimensional…

Machine Learning · Computer Science 2022-12-13 Yuxuan Li , James L. McClelland

Attend First, Consolidate Later: On the Importance of Attention in Different LLM Layers

In decoder-based LLMs, the representation of a given layer serves two purposes: as input to the next layer during the computation of the current token; and as input to the attention mechanism of future tokens. In this work, we show that the…

Computation and Language · Computer Science 2024-11-01 Amit Ben-Artzy , Roy Schwartz

Transfer learning for ensembles: reducing computation time and keeping the diversity

Transferring a deep neural network trained on one problem to another requires only a small amount of data and little additional computation time. The same behaviour holds for ensembles of deep learning models typically superior to a single…

Machine Learning · Computer Science 2022-06-28 Ilya Shashkov , Nikita Balabin , Evgeny Burnaev , Alexey Zaytsev

AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

Transformer-based pre-trained models with millions of parameters require large storage. Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters. In this…

Computation and Language · Computer Science 2023-01-31 Chin-Lun Fu , Zih-Ching Chen , Yun-Ru Lee , Hung-yi Lee

Teleportation algorithm using two species of entangled pairs

Teleportation algorithm assumes specific Bell states as input, but actual sources typically generates more than one. This work presents a teleportation algorithm for a two Bell states mixture, including remaining distortion from previous…

Quantum Physics · Physics 2014-10-21 Francisco Delgado