Related papers: Sparse Distributed Memory is a Continual Learner
Meta-learning consists in learning learning algorithms. We use a Long Short Term Memory (LSTM) based network to learn to compute on-line updates of the parameters of another neural network. These parameters are stored in the cell state of…
Continual learning is the ability to acquire new knowledge without forgetting the previously learned one, assuming no further access to past training data. Neural network approximators trained with gradient descent are known to fail in this…
Sparse neural systems are gaining traction for efficient continual learning due to their modularity and low interference. Architectures such as Sparse Distributed Memory Multi-Layer Perceptrons (SDMLP) construct task-specific subnetworks…
Artificial Neural Networks (ANNs) have emerged as hot topics in the research community. Despite the success of ANNs, it is challenging to train and deploy modern ANNs on commodity hardware due to the ever-increasing model size and the…
In a recent article we described a new type of deep neural network - a Perpetual Learning Machine (PLM) - which is capable of learning 'on the fly' like a brain by existing in a state of Perpetual Stochastic Gradient Descent (PSGD). Here,…
Training a deep neural network requires a large amount of single-task data and involves a long time-consuming optimization phase. This is not scalable to complex, realistic environments with new unexpected changes. Humans can perform fast…
Neural network learning is usually time-consuming since backpropagation needs to compute full gradients and backpropagate them across multiple layers. Despite its success of existing works in accelerating propagation through sparseness, the…
The definition of a Neural Network architecture is one of the most critical and challenging tasks to perform. In this paper, we propose ParallelMLPs. ParallelMLPs is a procedure to enable the training of several independent Multilayer…
In this paper, we introduce a novel layer designed to be used as the output of pre-trained neural networks in the context of classification. Based on Associative Memories, this layer can help design Deep Neural Networks which support…
A core component present in many successful neural network architectures, is an MLP block of two fully connected layers with a non-linear activation in between. An intriguing phenomenon observed empirically, including in transformer…
Neural networks augmented with external memory have the ability to learn algorithmic solutions to complex tasks. These models appear promising for applications such as language modeling and machine translation. However, they scale poorly in…
Efficient continual learning in humans is enabled by a rich set of neurophysiological mechanisms and interactions between multiple memory systems. The brain efficiently encodes information in non-overlapping sparse codes, which facilitates…
While deep learning has led to remarkable advances across diverse applications, it struggles in domains where the data distribution changes over the course of learning. In stark contrast, biological neural networks continually adapt to…
A fundamental task for artificial intelligence is learning. Deep Neural Networks have proven to cope perfectly with all learning paradigms, i.e. supervised, unsupervised, and reinforcement learning. Nevertheless, traditional deep learning…
Continual learning, or lifelong learning, is a formidable current challenge to machine learning. It requires the learner to solve a sequence of $k$ different learning tasks, one after the other, while retaining its aptitude for earlier…
Continual Learning (CL) methods aim to enable machine learning models to learn new tasks without catastrophic forgetting of those that have been previously mastered. Existing CL approaches often keep a buffer of previously-seen samples,…
Semiparametric language models (LMs) have shown promise in continuously learning from new text data by combining a parameterized neural LM with a growable non-parametric memory for memorizing new content. However, conventional…
Neural networks use their hidden layers to transform input data into linearly separable data clusters, with a linear or a perceptron type output layer making the final projection on the line perpendicular to the discriminating hyperplane.…
We propose a novel, efficient approach for distributed sparse learning in high-dimensions, where observations are randomly partitioned across machines. Computationally, at each round our method only requires the master machine to solve a…
The human brain can self-organize rich and diverse sparse neural pathways to incrementally master hundreds of cognitive tasks. However, most existing continual learning algorithms for deep artificial and spiking neural networks are unable…