Related papers: Overcoming Multi-Model Forgetting

Neural Networks Remember More: The Power of Parameter Isolation and Combination

Catastrophic forgetting is a pervasive issue for pre-trained language models (PLMs) during continual learning, where models lose previously acquired knowledge when sequentially trained on a series of tasks. The model's ability to retain old…

Computation and Language · Computer Science 2025-02-18 Biqing Zeng , Zehan Li , Aladdin Ayesh

Forgetting in Language Models: Capacity, Optimization, and Self-Generated Replay

Models trained on a new task typically degrade on prior tasks, a phenomenon known as forgetting. Traditionally, mitigating forgetting has required replaying stored exemplars from prior tasks, which is often impractical. By contrast,…

Machine Learning · Computer Science 2026-05-26 Martin Marek , Dongkyu Cho , Shikai Qiu , Rumi Chunara , Pavel Izmailov , Andrew Gordon Wilson

Why Do Neural Networks Forget: A Study of Collapse in Continual Learning

Catastrophic forgetting is a major problem in continual learning, and lots of approaches arise to reduce it. However, most of them are evaluated through task accuracy, which ignores the internal model structure. Recent research suggests…

Machine Learning · Computer Science 2026-03-06 Yunqin Zhu , Jun Jin

Maintaining Plasticity in Deep Continual Learning

Modern deep-learning systems are specialized to problem settings in which training occurs once and then never again, as opposed to continual-learning settings in which training occurs continually. If deep-learning systems are applied in a…

Machine Learning · Computer Science 2024-04-11 Shibhansh Dohare , J. Fernando Hernandez-Garcia , Parash Rahman , A. Rupam Mahmood , Richard S. Sutton

Meta Continual Learning

Using neural networks in practical settings would benefit from the ability of the networks to learn new tasks throughout their lifetimes without forgetting the previous tasks. This ability is limited in the current deep neural networks by a…

Machine Learning · Computer Science 2018-06-20 Risto Vuorio , Dong-Yeon Cho , Daejoong Kim , Jiwon Kim

A study on the plasticity of neural networks

One aim shared by multiple settings, such as continual learning or transfer learning, is to leverage previously acquired knowledge to converge faster on the current task. Usually this is done through fine-tuning, where an implicit…

Machine Learning · Computer Science 2023-10-17 Tudor Berariu , Wojciech Czarnecki , Soham De , Jorg Bornschein , Samuel Smith , Razvan Pascanu , Claudia Clopath

Predicting Plasticity in Deep Continual Learning: A Theoretical Perspective

Deep continual learning requires models to adapt to new tasks without retraining from scratch. However, neural networks can lose their ability to adapt to new tasks after training on previous ones, a phenomenon known as loss of plasticity.…

Machine Learning · Computer Science 2026-05-12 Jiuqi Wang , Jayanth Srinivasa , Claire Chen , Shuze Daniel Liu , Ali Payani , Shangtong Zhang

Mixed-Privacy Forgetting in Deep Networks

We show that the influence of a subset of the training samples can be removed -- or "forgotten" -- from the weights of a network trained on large-scale image classification tasks, and we provide strong computable bounds on the amount of…

Machine Learning · Computer Science 2021-06-22 Aditya Golatkar , Alessandro Achille , Avinash Ravichandran , Marzia Polito , Stefano Soatto

Memory-based Parameter Adaptation

Deep neural networks have excelled on a wide range of problems, from vision to language and game playing. Neural networks very gradually incorporate information into weights as they process data, requiring very low learning rates. If the…

Machine Learning · Statistics 2018-03-01 Pablo Sprechmann , Siddhant M. Jayakumar , Jack W. Rae , Alexander Pritzel , Adrià Puigdomènech Badia , Benigno Uria , Oriol Vinyals , Demis Hassabis , Razvan Pascanu , Charles Blundell

Disentangling the Causes of Plasticity Loss in Neural Networks

Underpinning the past decades of work on the design, initialization, and optimization of neural networks is a seemingly innocuous assumption: that the network is trained on a \textit{stationary} data distribution. In settings where this…

Machine Learning · Computer Science 2024-03-01 Clare Lyle , Zeyu Zheng , Khimya Khetarpal , Hado van Hasselt , Razvan Pascanu , James Martens , Will Dabney

Fortuitous Forgetting in Connectionist Networks

Forgetting is often seen as an unwanted characteristic in both human and machine learning. However, we propose that forgetting can in fact be favorable to learning. We introduce "forget-and-relearn" as a powerful paradigm for shaping the…

Machine Learning · Computer Science 2022-02-02 Hattie Zhou , Ankit Vani , Hugo Larochelle , Aaron Courville

Understanding plasticity in neural networks

Plasticity, the ability of a neural network to quickly change its predictions in response to new information, is essential for the adaptability and robustness of deep reinforcement learning systems. Deep neural networks are known to lose…

Machine Learning · Computer Science 2023-11-28 Clare Lyle , Zeyu Zheng , Evgenii Nikishin , Bernardo Avila Pires , Razvan Pascanu , Will Dabney

Continual Learning in Vision-Language Models via Aligned Model Merging

Continual learning is conventionally tackled through sequential fine-tuning, a process that, while enabling adaptation, inherently favors plasticity over the stability needed to retain prior knowledge. While existing approaches attempt to…

Computer Vision and Pattern Recognition · Computer Science 2025-06-05 Ghada Sokar , Gintare Karolina Dziugaite , Anurag Arnab , Ahmet Iscen , Pablo Samuel Castro , Cordelia Schmid

On Local Overfitting and Forgetting in Deep Neural Networks

The infrequent occurrence of overfitting in deep neural networks is perplexing: contrary to theoretical expectations, increasing model size often enhances performance in practice. But what if overfitting does occur, though restricted to…

Machine Learning · Computer Science 2025-01-08 Uri Stern , Tomer Yaacoby , Daphna Weinshall

Restoring Neural Network Plasticity for Faster Transfer Learning

Transfer learning with models pretrained on ImageNet has become a standard practice in computer vision. Transfer learning refers to fine-tuning pretrained weights of a neural network on a downstream task, typically unrelated to ImageNet.…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Xander Coetzer , Arné Schreuder , Anna Sergeevna Bosman

Negotiated Representations to Prevent Forgetting in Machine Learning Applications

Catastrophic forgetting is a significant challenge in the field of machine learning, particularly in neural networks. When a neural network learns to perform well on a new task, it often forgets its previously acquired knowledge or…

Machine Learning · Computer Science 2023-12-04 Nuri Korhan , Ceren Öner

Synaptic Metaplasticity in Binarized Neural Networks

While deep neural networks have surpassed human performance in multiple situations, they are prone to catastrophic forgetting: upon training a new task, they rapidly forget previously learned ones. Neuroscience studies, based on idealized…

Neural and Evolutionary Computing · Computer Science 2021-03-24 Axel Laborieux , Maxence Ernoult , Tifenn Hirtzlin , Damien Querlioz

Neural Network Retraining for Model Serving

We propose incremental (re)training of a neural network model to cope with a continuous flow of new data in inference during model serving. As such, this is a life-long learning process. We address two challenges of life-long retraining:…

Machine Learning · Computer Science 2020-04-30 Diego Klabjan , Xiaofeng Zhu

An Empirical Investigation of the Role of Pre-training in Lifelong Learning

The lifelong learning paradigm in machine learning is an attractive alternative to the more prominent isolated learning scheme not only due to its resemblance to biological learning but also its potential to reduce energy waste by obviating…

Machine Learning · Computer Science 2023-08-30 Sanket Vaibhav Mehta , Darshan Patil , Sarath Chandar , Emma Strubell

Sharing pattern submodels for prediction with missing values

Missing values are unavoidable in many applications of machine learning and present challenges both during training and at test time. When variables are missing in recurring patterns, fitting separate pattern submodels have been proposed as…

Machine Learning · Computer Science 2023-11-27 Lena Stempfle , Ashkan Panahi , Fredrik D. Johansson