Related papers: GateLoop: Fully Data-Controlled Linear Recurrence …

Sequence Modeling using Gated Recurrent Neural Networks

In this paper, we have used Recurrent Neural Networks to capture and model human motion data and generate motions by prediction of the next immediate data point at each time-step. Our RNN is armed with recently proposed Gated Recurrent…

Neural and Evolutionary Computing · Computer Science 2015-01-05 Mohammad Pezeshki

State-Space Modeling in Long Sequence Processing: A Survey on Recurrence in the Transformer Era

Effectively learning from sequential data is a longstanding goal of Artificial Intelligence, especially in the case of long sequences. From the dawn of Machine Learning, several researchers have pursued algorithms and architectures capable…

Machine Learning · Computer Science 2025-08-19 Matteo Tiezzi , Michele Casoni , Alessandro Betti , Marco Gori , Stefano Melacci

On the Resurgence of Recurrent Models for Long Sequences -- Survey and Research Opportunities in the Transformer Era

A longstanding challenge for the Machine Learning community is the one of developing models that are capable of processing and learning from very long sequences of data. The outstanding results of Transformers-based networks (e.g., Large…

Machine Learning · Computer Science 2024-02-15 Matteo Tiezzi , Michele Casoni , Alessandro Betti , Tommaso Guidi , Marco Gori , Stefano Melacci

Gated Associative Memory: A Parallel O(N) Architecture for Efficient Sequence Modeling

The Transformer architecture, underpinned by the self-attention mechanism, has become the de facto standard for sequence modeling tasks. However, its core computational primitive scales quadratically with sequence length (O(N^2)), creating…

Computation and Language · Computer Science 2025-09-03 Rishiraj Acharya

ReCAP: Recursive Context-Aware Reasoning and Planning for Large Language Model Agents

Long-horizon tasks requiring multi-step reasoning and dynamic re-planning remain challenging for large language models (LLMs). Sequential prompting methods are prone to context drift, loss of goal information, and recurrent failure cycles,…

Artificial Intelligence · Computer Science 2025-10-30 Zhenyu Zhang , Tianyi Chen , Weiran Xu , Alex Pentland , Jiaxin Pei

CountLoop: Training-Free High-Instance Image Generation via Iterative Agent Guidance

Diffusion models excel at photorealistic synthesis but struggle with precise object counts, especially in high-density settings. We introduce COUNTLOOP, a training-free framework that achieves precise instance control through iterative,…

Computer Vision and Pattern Recognition · Computer Science 2026-04-14 Anindya Mondal , Ayan Banerjee , Sauradip Nag , Josep Llados , Xiatian Zhu , Anjan Dutta

Recurrence-Complete Frame-based Action Models

In recent years, attention-like mechanisms have been used to great success in the space of large language models, unlocking scaling potential to a previously unthinkable extent. "Attention Is All You Need" famously claims RNN cells are not…

Machine Learning · Computer Science 2025-10-09 Michael Keiblinger

Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences

Attention is a commonly used mechanism in sequence processing, but it is of O(n^2) complexity which prevents its application to long sequences. The recently introduced neural Shuffle-Exchange network offers a computation-efficient…

Machine Learning · Computer Science 2021-01-18 Andis Draguns , Emīls Ozoliņš , Agris Šostaks , Matīss Apinis , Kārlis Freivalds

Parallelizing non-linear sequential models over the sequence length

Sequential models, such as Recurrent Neural Networks and Neural Ordinary Differential Equations, have long suffered from slow training due to their inherent sequential nature. For many years this bottleneck has persisted, as many thought…

Machine Learning · Computer Science 2024-01-17 Yi Heng Lim , Qi Zhu , Joshua Selfridge , Muhammad Firmansyah Kasim

Language Modeling with Gated Convolutional Networks

The pre-dominant approach to language modeling to date is based on recurrent neural networks. Their success on this task is often linked to their ability to capture unbounded context. In this paper we develop a finite context approach…

Computation and Language · Computer Science 2017-09-12 Yann N. Dauphin , Angela Fan , Michael Auli , David Grangier

Thinking While Listening: Fast-Slow Recurrence for Long-Horizon Sequential Modeling

We extend the recent latent recurrent modeling to sequential input streams. By interleaving fast, recurrent latent updates with self-organizational ability between slow observation updates, our method facilitates the learning of stable…

Machine Learning · Computer Science 2026-04-23 Shota Takashiro , Masanori Koyama , Takeru Miyato , Yusuke Iwasawa , Yutaka Matsuo , Kohei Hayashi

Log-Linear Attention

The attention mechanism in Transformers is an important primitive for accurate and scalable sequence modeling. Its quadratic-compute and linear-memory complexity however remain significant bottlenecks. Linear attention and state-space…

Machine Learning · Computer Science 2026-03-03 Han Guo , Songlin Yang , Tarushii Goel , Eric P. Xing , Tri Dao , Yoon Kim

Retentive Network: A Successor to Transformer for Large Language Models

In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection…

Computation and Language · Computer Science 2023-08-10 Yutao Sun , Li Dong , Shaohan Huang , Shuming Ma , Yuqing Xia , Jilong Xue , Jianyong Wang , Furu Wei

Convolutional Sequence to Sequence Learning

The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks. We introduce an architecture based entirely on convolutional neural networks. Compared to…

Computation and Language · Computer Science 2017-07-26 Jonas Gehring , Michael Auli , David Grangier , Denis Yarats , Yann N. Dauphin

Dissecting Linear Recurrent Models: How Different Gating Strategies Drive Selectivity and Generalization

Linear recurrent neural networks have emerged as efficient alternatives to the original Transformer's softmax attention mechanism, thanks to their highly parallelizable training and constant memory and computation requirements at inference.…

Machine Learning · Computer Science 2026-01-21 Younes Bouhadjar , Maxime Fabre , Felix Schmidt , Emre Neftci

ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding

While Multimodal Large Language Models (MLLMs) have achieved remarkable progress in open-ended visual question answering, they remain vulnerable to hallucinations. These are outputs that contradict or misrepresent input semantics, posing a…

Computer Vision and Pattern Recognition · Computer Science 2025-10-01 Jianjiang Yang , Yanshu li , Ziyan Huang

Recurrent Neural Networks for Learning Long-term Temporal Dependencies with Reanalysis of Time Scale Representation

Recurrent neural networks with a gating mechanism such as an LSTM or GRU are powerful tools to model sequential data. In the mechanism, a forget gate, which was introduced to control information flow in a hidden state in the RNN, has…

Machine Learning · Statistics 2021-11-08 Kentaro Ohno , Atsutoshi Kumagai

Hierarchically Gated Recurrent Neural Network for Sequence Modeling

Transformers have surpassed RNNs in popularity due to their superior abilities in parallel training and long-term dependency modeling. Recently, there has been a renewed interest in using linear RNNs for efficient sequence modeling. These…

Computation and Language · Computer Science 2023-11-09 Zhen Qin , Songlin Yang , Yiran Zhong

Linear Recurrent Units for Sequential Recommendation

State-of-the-art sequential recommendation relies heavily on self-attention-based recommender models. Yet such models are computationally expensive and often too slow for real-time recommendation. Furthermore, the self-attention operation…

Information Retrieval · Computer Science 2023-11-09 Zhenrui Yue , Yueqi Wang , Zhankui He , Huimin Zeng , Julian McAuley , Dong Wang

Linear Attention Sequence Parallelism

Sequence parallelism (SP) serves as a prevalent strategy to handle long sequences that exceed the memory limit of a single device. However, for linear sequence modeling methods like linear attention, existing SP approaches do not take…

Machine Learning · Computer Science 2025-05-19 Weigao Sun , Zhen Qin , Dong Li , Xuyang Shen , Yu Qiao , Yiran Zhong