English
Related papers

Related papers: Continual Learning via Sparse Memory Finetuning

200 papers

Large Language Models (LLMs) are typically static after training, yet real-world applications require continual adaptation to new knowledge without degrading existing capabilities. Standard approaches to updating models, like full…

Machine Learning · Computer Science 2026-04-08 Satyam Goyal , Anirudh Kanchi , Garv Shah , Prakhar Gupta

Adapting a pretrained language model to a new task often hurts the general capabilities it already had, a problem known as catastrophic forgetting. Sparse Memory Finetuning (SMF) tries to avoid this by adding key-value memory layers to the…

Computation and Language · Computer Science 2026-05-06 Prakhar Gupta , Garv Shah , Satyam Goyal , Anirudh Kanchi

Lifelong learning requires models that can continuously learn from sequential streams of data without suffering catastrophic forgetting due to shifts in data distributions. Deep learning models have thrived in the non-sequential learning…

Computation and Language · Computer Science 2021-07-27 Nithin Holla , Pushkar Mishra , Helen Yannakoudakis , Ekaterina Shutova

We study and quantify the problem of forgetting when fine-tuning pre-trained large language models (LLMs) on a downstream task. We find that parameter-efficient fine-tuning (PEFT) strategies, such as Low-Rank Adapters (LoRA), still suffer…

Computation and Language · Computer Science 2024-01-12 Damjan Kalajdzievski

Recent advancements in Large Language Models (LLMs) have showcased their remarkable capabilities in text understanding and generation. However, even stronger LLMs are susceptible to acquiring erroneous or obsolete information from the…

Computation and Language · Computer Science 2024-02-19 Shiwen Ni , Dingwei Chen , Chengming Li , Xiping Hu , Ruifeng Xu , Min Yang

Fully finetuning foundation language models (LMs) with billions of parameters is often impractical due to high computational costs, memory requirements, and the risk of overfitting. Although methods like low-rank adapters help address these…

Machine Learning · Computer Science 2026-02-11 Jonathan Svirsky , Yehonathan Refael , Ofir Lindenbaum

In this work, we consider learning sparse models in large scale settings, where the number of samples and the feature dimension can grow as large as millions or billions. Two immediate issues occur under such challenging scenario: (i)…

Machine Learning · Statistics 2023-01-31 Atul Dhingra , Jie Shen , Nicholas Kleene

Fine-tuning large language models (LLMs) is intended to improve their reasoning capabilities, yet we uncover a counterintuitive effect: models often forget how to solve problems they previously answered correctly during training. We term…

Artificial Intelligence · Computer Science 2025-05-27 Yuetai Li , Zhangchen Xu , Fengqing Jiang , Bhaskar Ramasubramanian , Luyao Niu , Bill Yuchen Lin , Xiang Yue , Radha Poovendran

Existing low-rank adaptation (LoRA) methods face challenges on sparse large language models (LLMs) due to the inability to maintain sparsity. Recent works introduced methods that maintain sparsity by augmenting LoRA techniques with…

Computation and Language · Computer Science 2025-01-16 Yuxuan Hu , Jing Zhang , Xiaodong Chen , Zhe Zhao , Cuiping Li , Hong Chen

Reinforcement learning (RL) yields substantial improvements in large language models (LLMs) downstream task performance and alignment with human values. Surprisingly, such large gains result from updating only a small subnetwork comprising…

Machine Learning · Computer Science 2025-12-19 Sagnik Mukherjee , Lifan Yuan , Dilek Hakkani-Tur , Hao Peng

Language models deployed in real-world systems often require post-hoc updates to incorporate new or corrected knowledge. However, editing such models efficiently and reliably-without retraining or forgetting previous information-remains a…

Computation and Language · Computer Science 2026-02-03 Ke Wang , Yiming Qin , Nikolaos Dimitriadis , Alessandro Favero , Pascal Frossard

Memorization in large language models (LLMs) makes them vulnerable to data extraction attacks. While pre-training memorization has been extensively studied, fewer works have explored its impact in fine-tuning, particularly for LoRA…

Machine Learning · Computer Science 2025-06-27 Fei Wang , Baochun Li

Existing research has shown that large language models (LLMs) exhibit remarkable performance in language understanding and generation. However, when LLMs are continuously fine-tuned on complex and diverse domain-specific downstream tasks,…

Machine Learning · Computer Science 2024-03-01 Weijieying Ren , Xinlong Li , Lei Wang , Tianxiang Zhao , Wei Qin

In continual learning (CL), model growth enhances adaptability to new data. However, when model growth is applied improperly, especially in task-agnostic CL, where the entire grown model is used for inference, it can lead to severe…

Machine Learning · Computer Science 2025-12-23 Yuqing Zhao , Jiannong Cao , Divya Saxena , Xiaoyun Liu , Changlin Song , Bo Yuan , Julie McCann

Using neural networks in practical settings would benefit from the ability of the networks to learn new tasks throughout their lifetimes without forgetting the previous tasks. This ability is limited in the current deep neural networks by a…

Machine Learning · Computer Science 2018-06-20 Risto Vuorio , Dong-Yeon Cho , Daejoong Kim , Jiwon Kim

Models trained on a new task typically degrade on prior tasks, a phenomenon known as forgetting. Traditionally, mitigating forgetting has required replaying stored exemplars from prior tasks, which is often impractical. By contrast,…

Machine Learning · Computer Science 2026-05-26 Martin Marek , Dongkyu Cho , Shikai Qiu , Rumi Chunara , Pavel Izmailov , Andrew Gordon Wilson

Adapting large pretrained models to new tasks efficiently and continually is crucial for real-world deployment but remains challenging due to catastrophic forgetting and the high cost of retraining. While parameter-efficient tuning methods…

Machine Learning · Computer Science 2026-02-06 Prakhar Kaushik , Ankit Vaidya , Shravan Chaudhari , Rama Chellappa , Alan Yuille

We consider the problem of accurate sparse fine-tuning of large language models (LLMs), that is, fine-tuning pretrained LLMs on specialized tasks, while inducing sparsity in their weights. On the accuracy side, we observe that standard…

Computation and Language · Computer Science 2023-10-16 Eldar Kurtic , Denis Kuznedelev , Elias Frantar , Michael Goin , Dan Alistarh

Continual learning methods based on pre-trained models (PTM) have recently gained attention which adapt to successive downstream tasks without catastrophic forgetting. These methods typically refrain from updating the pre-trained parameters…

Machine Learning · Computer Science 2026-05-22 Kun-Peng Ning , Hai-Jian Ke , Yu-Yang Liu , Jia-Yu Yao , Yong-Hong Tian , Li Yuan

In this thesis we discuss machine learning methods performing automated variable selection for learning sparse predictive models. There are multiple reasons for promoting sparsity in the predictive models. By relying on a limited set of…

Machine Learning · Computer Science 2019-03-27 Magda Gregorova
‹ Prev 1 2 3 10 Next ›