Related papers: Encoding-based Memory Modules for Recurrent Neural…
The effectiveness of recurrent neural networks can be largely influenced by their ability to store into their dynamical memory information extracted from input sequences at different frequencies and timescales. Such a feature can be…
Training RNNs to learn long-term dependencies is difficult due to vanishing gradients. We explore an alternative solution based on explicit memorization using linear autoencoders for sequences, which allows to maximize the short-term memory…
Recurrent neural networks can learn complex transduction problems that require maintaining and actively exploiting a memory of their inputs. Such models traditionally consider memory and input-output functionalities indissolubly entangled.…
The brain is targeted for processing temporal sequence information. It remains largely unclear how the brain learns to store and retrieve sequence memories. Here, we study how recurrent networks of binary neurons learn sequence attractors…
Recurrent neural network is a powerful model that learns temporal patterns in sequential data. For a long time, it was believed that recurrent networks are difficult to train using simple optimizers, such as stochastic gradient descent, due…
We reformulate the problem of encoding a multi-scale representation of a sequence in a language model by casting it in a continuous learning framework. We propose a hierarchical multi-scale language model in which short time-scale…
Identifying computational mechanisms for memorization and retrieval of data is a long-standing problem at the intersection of machine learning and neuroscience. Our main finding is that standard overparameterized deep neural networks…
Sequential activation of neurons is a common feature of network activity during a variety of behaviors, including working memory and decision making. Previous network models for sequences and memory emphasized specialized architectures in…
Encoding models are used for predicting brain activity in response to sensory stimuli with the objective of elucidating how sensory information is represented in the brain. Encoding models typically comprise a nonlinear transformation of…
Neural Machine Translation model is a sequence-to-sequence converter based on neural networks. Existing models use recurrent neural networks to construct both the encoder and decoder modules. In alternative research, the recurrent networks…
Working memory is a cognitive function involving the storage and manipulation of latent information over brief intervals of time, thus making it crucial for context-dependent computation. Here, we use a top-down modeling approach to examine…
Learning to remember long sequences remains a challenging task for recurrent neural networks. Register memory and attention mechanisms were both proposed to resolve the issue with either high computational cost to retain memory…
In this paper, we study novel neural network structures to better model long term dependency in sequential data. We propose to use more memory units to keep track of more preceding states in recurrent neural networks (RNNs), which are all…
Recurrent neural networks have been the dominant models for many speech and language processing tasks. However, we understand little about the behavior and the class of functions recurrent networks can realize. Moreover, the heuristics used…
The key challenge of sequence representation learning is to capture the long-range temporal dependencies. Typical methods for supervised sequence representation learning are built upon recurrent neural networks to capture temporal…
Transformers have reached remarkable success in sequence modeling. However, these models have efficiency issues as they need to store all the history token-level representations as memory. We present Memformer, an efficient neural network…
Recent advances in large language models have shown that autoregressive modeling can generate complex and novel sequences that have many real-world applications. However, these models must generate outputs autoregressively, which becomes…
Encoder-decoder networks with attention have proven to be a powerful way to solve many sequence-to-sequence tasks. In these networks, attention aligns encoder and decoder states and is often used for visualizing network behavior. However,…
We present the discriminative recurrent sparse auto-encoder model, comprising a recurrent encoder of rectified linear units, unrolled for a fixed number of iterations, and connected to two linear decoders that reconstruct the input and…
The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks. We introduce an architecture based entirely on convolutional neural networks. Compared to…