English
Related papers

Related papers: Universal Length Generalization with Turing Progra…

200 papers

The ability to extrapolate from short problem instances to longer ones is an important form of out-of-distribution generalization in reasoning tasks, and is crucial when learning from datasets where longer problem instances are rare. These…

Computation and Language · Computer Science 2022-11-15 Cem Anil , Yuhuai Wu , Anders Andreassen , Aitor Lewkowycz , Vedant Misra , Vinay Ramasesh , Ambrose Slone , Guy Gur-Ari , Ethan Dyer , Behnam Neyshabur

Length generalization, defined as the ability to extrapolate from shorter training sequences to longer test ones, is a significant challenge for language models. This issue persists even with large-scale Transformers handling relatively…

Machine Learning · Computer Science 2024-02-15 Yongchao Zhou , Uri Alon , Xinyun Chen , Xuezhi Wang , Rishabh Agarwal , Denny Zhou

Transformer language models have demonstrated impressive generalization capabilities in natural language domains, yet we lack a fine-grained understanding of how such generalization arises. In this paper, we investigate length…

Computation and Language · Computer Science 2025-08-05 Ziyang Cai , Nayoung Lee , Avi Schwarzschild , Samet Oymak , Dimitris Papailiopoulos

Chain-of-Thought (CoT) has been shown to empirically improve Transformers' performance, and theoretically increase their expressivity to Turing completeness. However, whether Transformers can learn to generalize to CoT traces longer than…

Machine Learning · Computer Science 2026-04-29 Oliver Kraus , Yash Sarrof , Yuekun Yao , Alexander Koller , Michael Hahn

A major challenge for transformers is generalizing to sequences longer than those observed during training. While previous works have empirically shown that transformers can either succeed or fail at length generalization depending on the…

Machine Learning · Computer Science 2025-05-01 Xinting Huang , Andy Yang , Satwik Bhattamishra , Yash Sarrof , Andreas Krebs , Hattie Zhou , Preetum Nakkiran , Michael Hahn

Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity. This raises the question of if and when Transformer models can learn the true…

Machine Learning · Computer Science 2023-10-25 Hattie Zhou , Arwen Bradley , Etai Littwin , Noam Razin , Omid Saremi , Josh Susskind , Samy Bengio , Preetum Nakkiran

Transformers have impressive generalization capabilities on tasks with a fixed context length. However, they fail to generalize to sequences of arbitrary length, even for seemingly simple tasks such as duplicating a string. Moreover, simply…

Transformers often struggle with length generalization, meaning they fail to generalize to sequences longer than those encountered during training. While arithmetic tasks are commonly used to study length generalization, certain tasks are…

Machine Learning · Computer Science 2025-04-18 Hanseul Cho , Jaeyoung Cha , Srinadh Bhojanapalli , Chulhee Yun

Transformer-based models excel in various tasks but their generalization capabilities, especially in arithmetic reasoning, remain incompletely understood. Arithmetic tasks provide a controlled framework to explore these capabilities, yet…

Machine Learning · Computer Science 2025-08-07 Xingcheng Xu , Zibo Zhao , Haipeng Zhang , Yanqing Yang

It has been observed in recent years that transformers have problems with length generalization for certain types of reasoning and arithmetic tasks. In particular, the performance of a transformer model trained on tasks (say addition) up to…

Machine Learning · Computer Science 2023-10-03 Pranjal Awasthi , Anupam Gupta

Recent work has shown that Transformers trained from scratch can successfully solve various arithmetic and algorithmic tasks, such as adding numbers and computing parity. While these Transformers generalize well on unseen inputs of the same…

Machine Learning · Computer Science 2025-05-13 Ying Fan , Yilun Du , Kannan Ramchandran , Kangwook Lee

Length generalization, the ability to solve problems of longer sequences than those observed during training, poses a core challenge of Transformer-based large language models (LLM). Although existing studies have predominantly focused on…

Computation and Language · Computer Science 2026-01-29 Zhouqi Hua , Wenwei Zhang , Chengqi Lyu , Yuzhe Gu , Songyang Gao , Kuikun Liu , Dahua Lin , Kai Chen

Length generalization is the ability of a learning algorithm to learn a hypothesis which generalizes to longer inputs than the inputs in the training set. In this paper, we provide provable guarantees of length generalization for various…

Machine Learning · Computer Science 2025-06-09 Thomas Chen , Tengyu Ma , Zhiyuan Li

We study the problem of length generalization (LG) in transformers: the ability of a model trained on shorter sequences to maintain performance when evaluated on much longer, previously unseen inputs. Prior work by Huang et al. (2025)…

Machine Learning · Computer Science 2025-11-03 Zachary Izzo , Eshaan Nichani , Jason D. Lee

Large language models display remarkable capabilities in logical and mathematical reasoning, allowing them to solve complex tasks. Interestingly, these abilities emerge in networks trained on the simple task of next-token prediction. In…

Machine Learning · Computer Science 2024-07-31 Eran Malach

Length generalization is a key property of a learning algorithm that enables it to make correct predictions on inputs of any length, given finite training data. To provide such a guarantee, one needs to be able to compute a length…

Machine Learning · Computer Science 2026-03-04 Andy Yang , Pascal Bergsträßer , Georg Zetzsche , David Chiang , Anthony W. Lin

The programming landscape is nowadays being reshaped by the advent of Large Language Models (LLMs) able to automate code-related tasks related to code implementation (e.g., code completion) and comprehension (e.g., code summarization). Such…

Software Engineering · Computer Science 2025-01-10 Nathan Cooper , Rosalia Tufano , Gabriele Bavota , Denys Poshyvanyk

Length Generalization is the essential capacity of autonomous agents to perform tasks in longer contexts than those encountered during training. To systematically study this feat, we test how well models can approximate the next token…

Generalization to novel compound tasks under distribution shift is important for deploying transformer-based language models (LMs). This work investigates Chain-of-Thought (CoT) reasoning as a means to enhance OOD generalization. Through…

Computation and Language · Computer Science 2026-03-31 Ru Wang , Wei Huang , Selena Song , Haoyu Zhang , Qian Niu , Yusuke Iwasawa , Yutaka Matsuo , Jiaxian Guo

The ability to reason lies at the core of artificial intelligence (AI), and challenging problems usually call for deeper and longer reasoning to tackle. A crucial question about AI reasoning is whether models can extrapolate learned…

Machine Learning · Computer Science 2025-11-11 Yu Huang , Zixin Wen , Aarti Singh , Yuejie Chi , Yuxin Chen
‹ Prev 1 2 3 10 Next ›