English
Related papers

Related papers: Carrying over algorithm in transformers

200 papers

Understanding the inner workings of machine learning models like Transformers is vital for their safe and ethical use. This paper provides a comprehensive analysis of a one-layer Transformer model trained to perform n-digit integer…

Machine Learning · Computer Science 2024-04-25 Philip Quirke , Fazl Barez

Even for simple arithmetic tasks like integer addition, it is challenging for Transformers to generalize to longer sequences than those encountered during training. To tackle this problem, we propose position coupling, a simple yet…

Machine Learning · Computer Science 2024-10-31 Hanseul Cho , Jaeyoung Cha , Pranjal Awasthi , Srinadh Bhojanapalli , Anupam Gupta , Chulhee Yun

The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits. We mend this problem by adding an embedding to…

Transformer-based large language models have achieved remarkable performance across various natural language processing tasks. However, they often struggle with seemingly easy tasks like arithmetic despite their vast capabilities. This…

Computation and Language · Computer Science 2024-07-23 Luyu Qiu , Jianing Li , Chi Su , Chen Jason Zhang , Lei Chen

There is a growing interest in the ability of neural networks to execute algorithmic tasks (e.g., arithmetic, summary statistics, and sorting). The goal of this work is to better understand the role of attention in Transformers for…

Machine Learning · Computer Science 2025-06-11 Artur Back de Luca , George Giapitzakis , Shenghao Yang , Petar Veličković , Kimon Fountoulakis

Large language models exhibit sophisticated capabilities, yet understanding how they work internally remains a central challenge. A fundamental obstacle is that training selects for behavior, not circuitry, so many weight configurations can…

Machine Learning · Computer Science 2026-02-27 Joshua S. Schiffman

The ability to perform arithmetic tasks is a remarkable trait of human intelligence and might form a critical component of more complex reasoning tasks. In this work, we investigate if the surface form of a number has any influence on how…

Computation and Language · Computer Science 2021-04-14 Rodrigo Nogueira , Zhiying Jiang , Jimmy Lin

Trained transformer models have been found to implement interpretable procedures for tasks like arithmetic and associative recall, but little is understood about how the circuits that implement these procedures originate during training. To…

Machine Learning · Computer Science 2024-10-08 Ziqian Zhong , Jacob Andreas

Transformers, central to the successes in modern Natural Language Processing, often falter on arithmetic tasks despite their vast capabilities --which paradoxically include remarkable coding abilities. We observe that a crucial challenge is…

Computation and Language · Computer Science 2023-11-28 Ruoqi Shen , Sébastien Bubeck , Ronen Eldan , Yin Tat Lee , Yuanzhi Li , Yi Zhang

After their initial success in natural language processing, transformer architectures have rapidly gained traction in computer vision, providing state-of-the-art results for tasks such as image classification, detection, segmentation, and…

Computer Vision and Pattern Recognition · Computer Science 2022-03-21 Hugo Touvron , Matthieu Cord , Alaaeldin El-Nouby , Jakob Verbeek , Hervé Jégou

Robotic manipulation can be formulated as inducing a sequence of spatial displacements: where the space being moved can encompass an object, part of an object, or end effector. In this work, we propose the Transporter Network, a simple…

Next to scaling considerations, architectural design choices profoundly shape the solution space of transformers. In this work, we analyze the solutions simple transformer blocks implement when tackling the histogram task: counting items in…

Machine Learning · Computer Science 2025-11-13 Freya Behrens , Luca Biggio , Lenka Zdeborová

While recent work has begun to uncover the internal strategies that Large Language Models (LLMs) employ for simple arithmetic tasks, a unified understanding of their underlying mechanisms is still lacking. We extend recent findings showing…

Computation and Language · Computer Science 2025-08-05 Tanja Baeumel , Daniil Gurgurov , Yusser al Ghussin , Josef van Genabith , Simon Ostermann

The rapid progress seen in terms of large-scale generative AI is largely based on the attention mechanism. It is conversely non-trivial to conceive small-scale applications for which attention-based architectures outperform traditional…

Machine Learning · Computer Science 2025-08-07 Claudius Gros

Mathematical reasoning is one of the most impressive achievements of human intellect but remains a formidable challenge for artificial intelligence systems. In this work we explore whether modern deep learning architectures can learn to…

Machine Learning · Computer Science 2022-07-07 Samuel Cognolato , Alberto Testolin

Transformer networks have seen great success in natural language processing and machine vision, where task objectives such as next word prediction and image classification benefit from nuanced context sensitivity across high-dimensional…

Machine Learning · Computer Science 2022-12-13 Yuxuan Li , James L. McClelland

In decoder-based LLMs, the representation of a given layer serves two purposes: as input to the next layer during the computation of the current token; and as input to the attention mechanism of future tokens. In this work, we show that the…

Computation and Language · Computer Science 2024-11-01 Amit Ben-Artzy , Roy Schwartz

Transferring a deep neural network trained on one problem to another requires only a small amount of data and little additional computation time. The same behaviour holds for ensembles of deep learning models typically superior to a single…

Machine Learning · Computer Science 2022-06-28 Ilya Shashkov , Nikita Balabin , Evgeny Burnaev , Alexey Zaytsev

Transformer-based pre-trained models with millions of parameters require large storage. Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters. In this…

Computation and Language · Computer Science 2023-01-31 Chin-Lun Fu , Zih-Ching Chen , Yun-Ru Lee , Hung-yi Lee

Teleportation algorithm assumes specific Bell states as input, but actual sources typically generates more than one. This work presents a teleportation algorithm for a two Bell states mixture, including remaining distortion from previous…

Quantum Physics · Physics 2014-10-21 Francisco Delgado
‹ Prev 1 2 3 10 Next ›