English
Related papers

Related papers: Learning State-Tracking from Code Using Linear RNN…

200 papers

Transformer language models (LMs) exhibit behaviors -- from storytelling to code generation -- that seem to require tracking the unobserved state of an evolving world. How do they do this? We study state tracking in LMs trained or…

Computation and Language · Computer Science 2025-11-03 Belinda Z. Li , Zifan Carl Guo , Jacob Andreas

Despite the remarkable practical success of transformer-based language models, recent work has raised concerns about their ability to perform state tracking. In particular, a growing body of literature has shown this limitation primarily…

Machine Learning · Computer Science 2026-02-23 M. Reza Ebrahimi , Michaël Defferrard , Sunny Panchal , Roland Memisevic

Linear Recurrent Neural Networks (LRNNs) such as Mamba, RWKV, GLA, mLSTM, and DeltaNet have emerged as efficient alternatives to Transformers for long sequences. However, both Transformers and LRNNs struggle to perform state-tracking, which…

Machine Learning · Computer Science 2025-03-19 Riccardo Grazzi , Julien Siems , Arber Zela , Jörg K. H. Franke , Frank Hutter , Massimiliano Pontil

Large Language Models (LLMs) have demonstrated impressive capabilities in solving complex tasks, including those requiring a certain level of reasoning. In this paper, we focus on state tracking, a problem where models need to keep track of…

Computation and Language · Computer Science 2025-11-14 Kiamehr Rezaee , Jose Camacho-Collados , Mohammad Taher Pilehvar

Humans can easily reason about the sequence of high level actions needed to complete tasks, but it is particularly difficult to instil this ability in robots trained from relatively few examples. This work considers the task of neural…

Robotics · Computer Science 2021-02-08 Michael Burke , Kartic Subr , Subramanian Ramamoorthy

Modeling the parser state is key to good performance in transition-based parsing. Recurrent Neural Networks considerably improved the performance of transition-based systems by modelling the global state, e.g. stack-LSTM parsers, or local…

Computation and Language · Computer Science 2020-10-22 Ramon Fernandez Astudillo , Miguel Ballesteros , Tahira Naseem , Austin Blodgett , Radu Florian

The theory of state tracking in recurrent architectures has predominantly focused on expressive capacity: whether a fixed architecture can theoretically realize a set of symbolic transition rules. We argue that equally important is error…

Machine Learning · Computer Science 2026-05-11 Jiwan Chung , Heechan Choi , Seon Joo Kim

State-space models (SSMs) have emerged as a potential alternative architecture for building large language models (LLMs) compared to the previously ubiquitous transformer architecture. One theoretical weakness of transformers is that they…

Machine Learning · Computer Science 2025-03-07 William Merrill , Jackson Petty , Ashish Sabharwal

Effectively learning from sequential data is a longstanding goal of Artificial Intelligence, especially in the case of long sequences. From the dawn of Machine Learning, several researchers have pursued algorithms and architectures capable…

Machine Learning · Computer Science 2025-08-19 Matteo Tiezzi , Michele Casoni , Alessandro Betti , Marco Gori , Stefano Melacci

Transformers encode structure in sequences via an expanding contextual history. However, their purely feedforward architecture fundamentally limits dynamic state tracking. State tracking -- the iterative updating of latent variables…

Machine Learning · Computer Science 2026-04-29 Michael C. Mozer , Shoaib Ahmed Siddiqui , Rosanne Liu

Transformer-based sequence-to-sequence architectures, while achieving state-of-the-art results on a large number of NLP tasks, can still suffer from overfitting during training. In practice, this is usually countered either by applying…

Computation and Language · Computer Science 2022-01-04 Dušan Variš , Ondřej Bojar

Recurrent Neural Networks (RNNs) continue to show outstanding performance in sequence modeling tasks. However, training RNNs on long sequences often face challenges like slow inference, vanishing gradients and difficulty in capturing long…

Artificial Intelligence · Computer Science 2018-02-06 Victor Campos , Brendan Jou , Xavier Giro-i-Nieto , Jordi Torres , Shih-Fu Chang

Building and maintaining state to learn policies and value functions is critical for deploying reinforcement learning (RL) agents in the real world. Recurrent neural networks (RNNs) have become a key point of interest for the state-building…

Machine Learning · Computer Science 2026-05-19 Matthew Schlegel , Volodymyr Tkachuk , Adam White , Martha White

Recurrent neural networks are a widely used class of neural architectures. They have, however, two shortcomings. First, it is difficult to understand what exactly they learn. Second, they tend to work poorly on sequences requiring long-term…

Machine Learning · Computer Science 2019-05-08 Cheng Wang , Mathias Niepert

In recent studies, linear recurrent neural networks (LRNNs) have achieved Transformer-level performance in natural language and long-range modeling, while offering rapid parallel training and constant inference cost. With the resurgence of…

Computation and Language · Computer Science 2024-04-10 Ting-Han Fan , Ta-Chung Chi , Alexander I. Rudnicky

The execution behavior of a program often depends on external resources, such as program inputs or file contents, and so cannot be run in isolation. Nevertheless, software developers benefit from fast iteration loops where automated tools…

Machine Learning · Computer Science 2022-03-30 David Bieber , Rishab Goel , Daniel Zheng , Hugo Larochelle , Daniel Tarlow

The paper studies the capabilities of Recurrent-Neural-Network sequence to sequence (RNN seq2seq) models in learning four transduction tasks: identity, reversal, total reduplication, and quadratic copying. These transductions are…

Computation and Language · Computer Science 2024-04-23 Zhengxiang Wang

Tracking entities in procedural language requires understanding the transformations arising from actions on entities as well as those entities' interactions. While self-attention-based pre-trained language encoders like GPT and BERT have…

Computation and Language · Computer Science 2019-09-09 Aditya Gupta , Greg Durrett

Recursive processing is considered a hallmark of human linguistic abilities. A recent study evaluated recursive processing in recurrent neural language models (RNN-LMs) and showed that such models perform below chance level on embedded…

Computation and Language · Computer Science 2021-10-15 Yair Lakretz , Théo Desbordes , Dieuwke Hupkes , Stanislas Dehaene

Recurrent neural networks (RNNs) can be interpreted as discrete-time state-space models, where the state evolution corresponds to an infinite-impulse-response (IIR) filtering operation governed by both feedforward weights and recurrent…

Machine Learning · Computer Science 2026-02-26 Alexander Morgan , Ummay Sumaya Khan , Lingjia Liu , Lizhong Zheng
‹ Prev 1 2 3 10 Next ›