Related papers: Continual Learning via Sparse Memory Finetuning

Improving Sparse Memory Finetuning

Large Language Models (LLMs) are typically static after training, yet real-world applications require continual adaptation to new knowledge without degrading existing capabilities. Standard approaches to updating models, like full…

Machine Learning · Computer Science 2026-04-08 Satyam Goyal , Anirudh Kanchi , Garv Shah , Prakhar Gupta

Sparse Memory Finetuning as a Low-Forgetting Alternative to LoRA and Full Finetuning

Adapting a pretrained language model to a new task often hurts the general capabilities it already had, a problem known as catastrophic forgetting. Sparse Memory Finetuning (SMF) tries to avoid this by adding key-value memory layers to the…

Computation and Language · Computer Science 2026-05-06 Prakhar Gupta , Garv Shah , Satyam Goyal , Anirudh Kanchi

Meta-Learning with Sparse Experience Replay for Lifelong Language Learning

Lifelong learning requires models that can continuously learn from sequential streams of data without suffering catastrophic forgetting due to shifts in data distributions. Deep learning models have thrived in the non-sequential learning…

Computation and Language · Computer Science 2021-07-27 Nithin Holla , Pushkar Mishra , Helen Yannakoudakis , Ekaterina Shutova

Scaling Laws for Forgetting When Fine-Tuning Large Language Models

We study and quantify the problem of forgetting when fine-tuning pre-trained large language models (LLMs) on a downstream task. We find that parameter-efficient fine-tuning (PEFT) strategies, such as Low-Rank Adapters (LoRA), still suffer…

Computation and Language · Computer Science 2024-01-12 Damjan Kalajdzievski

Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models

Recent advancements in Large Language Models (LLMs) have showcased their remarkable capabilities in text understanding and generation. However, even stronger LLMs are susceptible to acquiring erroneous or obsolete information from the…

Computation and Language · Computer Science 2024-02-19 Shiwen Ni , Dingwei Chen , Chengming Li , Xiping Hu , Ruifeng Xu , Min Yang

Train Less, Infer Faster: Efficient Model Finetuning and Compression via Structured Sparsity

Fully finetuning foundation language models (LMs) with billions of parameters is often impractical due to high computational costs, memory requirements, and the risk of overfitting. Although methods like low-rank adapters help address these…

Machine Learning · Computer Science 2026-02-11 Jonathan Svirsky , Yehonathan Refael , Ofir Lindenbaum

Learning Large Scale Sparse Models

In this work, we consider learning sparse models in large scale settings, where the number of samples and the feature dimension can grow as large as millions or billions. Two immediate issues occur under such challenging scenario: (i)…

Machine Learning · Statistics 2023-01-31 Atul Dhingra , Jie Shen , Nicholas Kleene

Temporal Sampling for Forgotten Reasoning in LLMs

Fine-tuning large language models (LLMs) is intended to improve their reasoning capabilities, yet we uncover a counterintuitive effect: models often forget how to solve problems they previously answered correctly during training. We term…

Artificial Intelligence · Computer Science 2025-05-27 Yuetai Li , Zhangchen Xu , Fengqing Jiang , Bhaskar Ramasubramanian , Luyao Niu , Bill Yuchen Lin , Xiang Yue , Radha Poovendran

LoRS: Efficient Low-Rank Adaptation for Sparse Large Language Model

Existing low-rank adaptation (LoRA) methods face challenges on sparse large language models (LLMs) due to the inability to maintain sparsity. Recent works introduced methods that maintain sparsity by augmenting LoRA techniques with…

Computation and Language · Computer Science 2025-01-16 Yuxuan Hu , Jing Zhang , Xiaodong Chen , Zhe Zhao , Cuiping Li , Hong Chen

Reinforcement Learning Finetunes Small Subnetworks in Large Language Models

Reinforcement learning (RL) yields substantial improvements in large language models (LLMs) downstream task performance and alignment with human values. Surprisingly, such large gains result from updating only a small subnetwork comprising…

Machine Learning · Computer Science 2025-12-19 Sagnik Mukherjee , Lifan Yuan , Dilek Hakkani-Tur , Hao Peng

MEMOIR: Lifelong Model Editing with Minimal Overwrite and Informed Retention for LLMs

Language models deployed in real-world systems often require post-hoc updates to incorporate new or corrected knowledge. However, editing such models efficiently and reliably-without retraining or forgetting previous information-remains a…

Computation and Language · Computer Science 2026-02-03 Ke Wang , Yiming Qin , Nikolaos Dimitriadis , Alessandro Favero , Pascal Frossard

Leaner Training, Lower Leakage: Revisiting Memorization in LLM Fine-Tuning with LoRA

Memorization in large language models (LLMs) makes them vulnerable to data extraction attacks. While pre-training memorization has been extensively studied, fewer works have explored its impact in fine-tuning, particularly for LoRA…

Machine Learning · Computer Science 2025-06-27 Fei Wang , Baochun Li

Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning

Existing research has shown that large language models (LLMs) exhibit remarkable performance in language understanding and generation. However, when LLMs are continuously fine-tuned on complex and diverse domain-specific downstream tasks,…

Machine Learning · Computer Science 2024-03-01 Weijieying Ren , Xinlong Li , Lei Wang , Tianxiang Zhao , Wei Qin

Overcoming Growth-Induced Forgetting in Task-Agnostic Continual Learning

In continual learning (CL), model growth enhances adaptability to new data. However, when model growth is applied improperly, especially in task-agnostic CL, where the entire grown model is used for inference, it can lead to severe…

Machine Learning · Computer Science 2025-12-23 Yuqing Zhao , Jiannong Cao , Divya Saxena , Xiaoyun Liu , Changlin Song , Bo Yuan , Julie McCann

Meta Continual Learning

Using neural networks in practical settings would benefit from the ability of the networks to learn new tasks throughout their lifetimes without forgetting the previous tasks. This ability is limited in the current deep neural networks by a…

Machine Learning · Computer Science 2018-06-20 Risto Vuorio , Dong-Yeon Cho , Daejoong Kim , Jiwon Kim

Forgetting in Language Models: Capacity, Optimization, and Self-Generated Replay

Models trained on a new task typically degrade on prior tasks, a phenomenon known as forgetting. Traditionally, mitigating forgetting has required replaying stored exemplars from prior tasks, which is often impractical. By contrast,…

Machine Learning · Computer Science 2026-05-26 Martin Marek , Dongkyu Cho , Shikai Qiu , Rumi Chunara , Pavel Izmailov , Andrew Gordon Wilson

Shared LoRA Subspaces for almost Strict Continual Learning

Adapting large pretrained models to new tasks efficiently and continually is crucial for real-world deployment but remains challenging due to catastrophic forgetting and the high cost of retraining. While parameter-efficient tuning methods…

Machine Learning · Computer Science 2026-02-06 Prakhar Kaushik , Ankit Vaidya , Shravan Chaudhari , Rama Chellappa , Alan Yuille

Sparse Fine-tuning for Inference Acceleration of Large Language Models

We consider the problem of accurate sparse fine-tuning of large language models (LLMs), that is, fine-tuning pretrained LLMs on specialized tasks, while inducing sparsity in their weights. On the accuracy side, we observe that standard…

Computation and Language · Computer Science 2023-10-16 Eldar Kurtic , Denis Kuznedelev , Elias Frantar , Michael Goin , Dan Alistarh

Sparse Orthogonal Parameters Tuning for Continual Learning

Continual learning methods based on pre-trained models (PTM) have recently gained attention which adapt to successive downstream tasks without catastrophic forgetting. These methods typically refrain from updating the pre-trained parameters…

Machine Learning · Computer Science 2026-05-22 Kun-Peng Ning , Hai-Jian Ke , Yu-Yang Liu , Jia-Yu Yao , Yong-Hong Tian , Li Yuan

Sparse Learning for Variable Selection with Structures and Nonlinearities

In this thesis we discuss machine learning methods performing automated variable selection for learning sparse predictive models. There are multiple reasons for promoting sparsity in the predictive models. By relying on a limited set of…

Machine Learning · Computer Science 2019-03-27 Magda Gregorova