Related papers: Maelstrom Networks

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise.…

Computer Vision and Pattern Recognition · Computer Science 2016-06-02 Jeff Donahue , Lisa Anne Hendricks , Marcus Rohrbach , Subhashini Venugopalan , Sergio Guadarrama , Kate Saenko , Trevor Darrell

Think Before You Act: Decision Transformers with Working Memory

Decision Transformer-based decision-making agents have shown the ability to generalize across multiple tasks. However, their performance relies on massive data and computation. We argue that this inefficiency stems from the forgetting…

Machine Learning · Computer Science 2024-05-30 Jikun Kang , Romain Laroche , Xingdi Yuan , Adam Trischler , Xue Liu , Jie Fu

Task Agnostic Continual Learning via Meta Learning

While neural networks are powerful function approximators, they suffer from catastrophic forgetting when the data distribution is not stationary. One particular formalism that studies learning under non-stationary distribution is provided…

Machine Learning · Statistics 2019-06-13 Xu He , Jakub Sygnowski , Alexandre Galashov , Andrei A. Rusu , Yee Whye Teh , Razvan Pascanu

Wider and Deeper, Cheaper and Faster: Tensorized LSTMs for Sequence Learning

Long Short-Term Memory (LSTM) is a popular approach to boosting the ability of Recurrent Neural Networks to store longer term temporal information. The capacity of an LSTM network can be increased by widening and adding layers. However,…

Machine Learning · Statistics 2017-12-14 Zhen He , Shaobing Gao , Liang Xiao , Daxue Liu , Hangen He , David Barber

On the Resurgence of Recurrent Models for Long Sequences -- Survey and Research Opportunities in the Transformer Era

A longstanding challenge for the Machine Learning community is the one of developing models that are capable of processing and learning from very long sequences of data. The outstanding results of Transformers-based networks (e.g., Large…

Machine Learning · Computer Science 2024-02-15 Matteo Tiezzi , Michele Casoni , Alessandro Betti , Tommaso Guidi , Marco Gori , Stefano Melacci

Separation of Memory and Processing in Dual Recurrent Neural Networks

We explore a neural network architecture that stacks a recurrent layer and a feedforward layer that is also connected to the input, and compare it to standard Elman and LSTM architectures in terms of accuracy and interpretability. When…

Neural and Evolutionary Computing · Computer Science 2020-05-29 Christian Oliva , Luis F. Lago-Fernández

Recurrent Neural Network Training with Dark Knowledge Transfer

Recurrent neural networks (RNNs), particularly long short-term memory (LSTM), have gained much attention in automatic speech recognition (ASR). Although some successful stories have been reported, training RNNs remains highly challenging,…

Machine Learning · Statistics 2016-09-21 Zhiyuan Tang , Dong Wang , Zhiyong Zhang

Learning to Remember from a Multi-Task Teacher

Recent studies on catastrophic forgetting during sequential learning typically focus on fixing the accuracy of the predictions for a previously learned task. In this paper we argue that the outputs of neural networks are subject to rapid…

Machine Learning · Computer Science 2020-02-14 Yuwen Xiong , Mengye Ren , Raquel Urtasun

Using Fast Weights to Attend to the Recent Past

Until recently, research on artificial neural networks was largely restricted to systems with only two types of variable: Neural activities that represent the current or recent input and weights that learn to capture regularities among…

Machine Learning · Statistics 2016-12-06 Jimmy Ba , Geoffrey Hinton , Volodymyr Mnih , Joel Z. Leibo , Catalin Ionescu

Efficient Rehearsal Free Zero Forgetting Continual Learning using Adaptive Weight Modulation

Artificial neural networks encounter a notable challenge known as continual learning, which involves acquiring knowledge of multiple tasks over an extended period. This challenge arises due to the tendency of previously learned weights to…

Computer Vision and Pattern Recognition · Computer Science 2023-11-28 Yonatan Sverdlov , Shimon Ullman

Learning Over Long Time Lags

The advantage of recurrent neural networks (RNNs) in learning dependencies between time-series data has distinguished RNNs from other deep learning models. Recently, many advances are proposed in this emerging field. However, there is a…

Neural and Evolutionary Computing · Computer Science 2016-02-16 Hojjat Salehinejad

Self-Attention Meta-Learner for Continual Learning

Continual learning aims to provide intelligent agents capable of learning multiple tasks sequentially with neural networks. One of its main challenging, catastrophic forgetting, is caused by the neural networks non-optimal ability to learn…

Machine Learning · Computer Science 2021-01-29 Ghada Sokar , Decebal Constantin Mocanu , Mykola Pechenizkiy

LiteLSTM Architecture for Deep Recurrent Neural Networks

Long short-term memory (LSTM) is a robust recurrent neural network architecture for learning spatiotemporal sequential data. However, it requires significant computational power for learning and implementing from both software and hardware…

Machine Learning · Computer Science 2022-10-26 Nelly Elsayed , Zag ElSayed , Anthony S. Maida

Continual Learning in Recurrent Neural Networks

While a diverse collection of continual learning (CL) methods has been proposed to prevent catastrophic forgetting, a thorough investigation of their effectiveness for processing sequential data with recurrent neural networks (RNNs) is…

Machine Learning · Computer Science 2021-03-11 Benjamin Ehret , Christian Henning , Maria R. Cervera , Alexander Meulemans , Johannes von Oswald , Benjamin F. Grewe

Wide Neural Networks Forget Less Catastrophically

A primary focus area in continual learning research is alleviating the "catastrophic forgetting" problem in neural networks by designing new algorithms that are more robust to the distribution shifts. While the recent progress in continual…

Machine Learning · Computer Science 2022-07-15 Seyed Iman Mirzadeh , Arslan Chaudhry , Dong Yin , Huiyi Hu , Razvan Pascanu , Dilan Gorur , Mehrdad Farajtabar

Three scenarios for continual learning

Standard artificial neural networks suffer from the well-known issue of catastrophic forgetting, making continual or lifelong learning difficult for machine learning. In recent years, numerous methods have been proposed for continual…

Machine Learning · Computer Science 2019-04-18 Gido M. van de Ven , Andreas S. Tolias

Deep Learning: a new definition of artificial neuron with double weight

Deep learning is a subset of a broader family of machine learning methods based on learning data representations. These models are inspired by human biological nervous systems, even if there are various differences pertaining to the…

Neural and Evolutionary Computing · Computer Science 2019-05-22 Adriano Baldeschi , Raffaella Margutti , Adam Miller

Targeted Forgetting and False Memory Formation in Continual Learners through Adversarial Backdoor Attacks

Artificial neural networks are well-known to be susceptible to catastrophic forgetting when continually learning from sequences of tasks. Various continual (or "incremental") learning approaches have been proposed to avoid catastrophic…

Machine Learning · Computer Science 2020-02-19 Muhammad Umer , Glenn Dawson , Robi Polikar

Momentum Residual Neural Networks

The training of deep residual neural networks (ResNets) with backpropagation has a memory cost that increases linearly with respect to the depth of the network. A way to circumvent this issue is to use reversible architectures. In this…

Machine Learning · Computer Science 2021-07-23 Michael E. Sander , Pierre Ablin , Mathieu Blondel , Gabriel Peyré

Block Neural Network Avoids Catastrophic Forgetting When Learning Multiple Task

In the present work we propose a Deep Feed Forward network architecture which can be trained according to a sequential learning paradigm, where tasks of increasing difficulty are learned sequentially, yet avoiding catastrophic forgetting.…

Neural and Evolutionary Computing · Computer Science 2017-11-29 Guglielmo Montone , J. Kevin O'Regan , Alexander V. Terekhov