Related papers: Recurrence-Complete Frame-based Action Models

Recurrent Neural Network from Adder's Perspective: Carry-lookahead RNN

The recurrent network architecture is a widely used model in sequence modeling, but its serial dependency hinders the computation parallelization, which makes the operation inefficient. The same problem was encountered in serial adder at…

Machine Learning · Computer Science 2021-08-25 Haowei Jiang , Feiwei Qin , Jin Cao , Yong Peng , Yanli Shao

Efficient recurrent architectures through activity sparsity and sparse back-propagation through time

Recurrent neural networks (RNNs) are well suited for solving sequence tasks in resource-constrained systems due to their expressivity and low computational requirements. However, there is still a need to bridge the gap between what RNNs are…

Machine Learning · Computer Science 2023-03-13 Anand Subramoney , Khaleelulla Khan Nazeer , Mark Schöne , Christian Mayr , David Kappel

Human Sentence Processing: Recurrence or Attention?

Recurrent neural networks (RNNs) have long been an architecture of interest for computational models of human sentence processing. The recently introduced Transformer architecture outperforms RNNs on many natural language processing tasks…

Computation and Language · Computer Science 2022-03-31 Danny Merkx , Stefan L. Frank

Residual Attention Net for Superior Cross-Domain Time Sequence Modeling

We present a novel architecture, residual attention net (RAN), which merges a sequence architecture, universal transformer, and a computer vision architecture, residual net, with a high-way architecture for cross-domain sequence modeling.…

Machine Learning · Computer Science 2020-01-14 Seth H. Huang , Xu Lingjie , Jiang Congwei

Reversible Recurrent Neural Networks

Recurrent neural networks (RNNs) provide state-of-the-art performance in processing sequential data but are memory intensive to train, limiting the flexibility of RNN models which can be trained. Reversible RNNs---RNNs for which the…

Machine Learning · Computer Science 2018-10-26 Matthew MacKay , Paul Vicol , Jimmy Ba , Roger Grosse

Rethinking Full Connectivity in Recurrent Neural Networks

Recurrent neural networks (RNNs) are omnipresent in sequence modeling tasks. Practical models usually consist of several layers of hundreds or thousands of neurons which are fully connected. This places a heavy computational and memory…

Machine Learning · Computer Science 2019-05-30 Matthijs Van Keirsbilck , Alexander Keller , Xiaodong Yang

GateLoop: Fully Data-Controlled Linear Recurrence for Sequence Modeling

Linear Recurrence has proven to be a powerful tool for modeling long sequences efficiently. In this work, we show that existing models fail to take full advantage of its potential. Motivated by this finding, we develop GateLoop, a…

Machine Learning · Computer Science 2024-01-30 Tobias Katsch

Attention as an RNN

The advent of Transformers marked a significant breakthrough in sequence modelling, providing a highly performant architecture capable of leveraging GPU parallelism. However, Transformers are computationally expensive at inference time,…

Machine Learning · Computer Science 2024-05-29 Leo Feng , Frederick Tung , Hossein Hajimirsadeghi , Mohamed Osama Ahmed , Yoshua Bengio , Greg Mori

Agglomerative Attention

Neural networks using transformer-based architectures have recently demonstrated great power and flexibility in modeling sequences of many types. One of the core components of transformer networks is the attention layer, which allows…

Machine Learning · Computer Science 2019-07-16 Matthew Spellings

Attention is All You Need Until You Need Retention

This work introduces a novel Retention Layer mechanism for Transformer based architectures, addressing their inherent lack of intrinsic retention capabilities. Unlike human cognition, which can encode and dynamically recall symbolic…

Machine Learning · Computer Science 2025-01-17 M. Murat Yaslioglu

Is Attention All What You Need? -- An Empirical Investigation on Convolution-Based Active Memory and Self-Attention

The key to a Transformer model is the self-attention mechanism, which allows the model to analyze an entire sequence in a computationally efficient manner. Recent work has suggested the possibility that general attention mechanisms used by…

Machine Learning · Computer Science 2020-01-01 Thomas Dowdell , Hongyu Zhang

Attend and Diagnose: Clinical Time Series Analysis using Attention Models

With widespread adoption of electronic health records, there is an increased emphasis for predictive models that can effectively deal with clinical time-series data. Powered by Recurrent Neural Network (RNN) architectures with Long…

Machine Learning · Statistics 2018-07-17 Huan Song , Deepta Rajan , Jayaraman J. Thiagarajan , Andreas Spanias

The Importance of Being Recurrent for Modeling Hierarchical Structure

Recent work has shown that recurrent neural networks (RNNs) can implicitly capture and exploit hierarchical information when trained to solve common natural language processing tasks such as language modeling (Linzen et al., 2016) and…

Computation and Language · Computer Science 2018-08-29 Ke Tran , Arianna Bisazza , Christof Monz

On the Resurgence of Recurrent Models for Long Sequences -- Survey and Research Opportunities in the Transformer Era

A longstanding challenge for the Machine Learning community is the one of developing models that are capable of processing and learning from very long sequences of data. The outstanding results of Transformers-based networks (e.g., Large…

Machine Learning · Computer Science 2024-02-15 Matteo Tiezzi , Michele Casoni , Alessandro Betti , Tommaso Guidi , Marco Gori , Stefano Melacci

Retentive Network: A Successor to Transformer for Large Language Models

In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection…

Computation and Language · Computer Science 2023-08-10 Yutao Sun , Li Dong , Shaohan Huang , Shuming Ma , Yuqing Xia , Jilong Xue , Jianyong Wang , Furu Wei

Gated recurrent neural networks discover attention

Recent architectural developments have enabled recurrent neural networks (RNNs) to reach and even surpass the performance of Transformers on certain sequence modeling tasks. These modern RNNs feature a prominent design pattern: linear…

Machine Learning · Computer Science 2024-02-08 Nicolas Zucchet , Seijin Kobayashi , Yassir Akram , Johannes von Oswald , Maxime Larcher , Angelika Steger , João Sacramento

A Critical Review of Recurrent Neural Networks for Sequence Learning

Countless learning tasks require dealing with sequential data. Image captioning, speech synthesis, and music generation all require that a model produce outputs that are sequences. In other domains, such as time series prediction, video…

Machine Learning · Computer Science 2015-10-20 Zachary C. Lipton , John Berkowitz , Charles Elkan

Investigating Action Encodings in Recurrent Neural Networks in Reinforcement Learning

Building and maintaining state to learn policies and value functions is critical for deploying reinforcement learning (RL) agents in the real world. Recurrent neural networks (RNNs) have become a key point of interest for the state-building…

Machine Learning · Computer Science 2026-05-19 Matthew Schlegel , Volodymyr Tkachuk , Adam White , Martha White

R-Transformer: Recurrent Neural Network Enhanced Transformer

Recurrent Neural Networks have long been the dominating choice for sequence modeling. However, it severely suffers from two issues: impotent in capturing very long-term dependencies and unable to parallelize the sequential computation…

Machine Learning · Computer Science 2019-07-15 Zhiwei Wang , Yao Ma , Zitao Liu , Jiliang Tang

Variable Computation in Recurrent Neural Networks

Recurrent neural networks (RNNs) have been used extensively and with increasing success to model various types of sequential data. Much of this progress has been achieved through devising recurrent units and architectures with the…

Machine Learning · Statistics 2017-03-06 Yacine Jernite , Edouard Grave , Armand Joulin , Tomas Mikolov