Related papers: Improving Sparse Memory Finetuning

Sparse Memory Finetuning as a Low-Forgetting Alternative to LoRA and Full Finetuning

Adapting a pretrained language model to a new task often hurts the general capabilities it already had, a problem known as catastrophic forgetting. Sparse Memory Finetuning (SMF) tries to avoid this by adding key-value memory layers to the…

Computation and Language · Computer Science 2026-05-06 Prakhar Gupta , Garv Shah , Satyam Goyal , Anirudh Kanchi

Continual Learning via Sparse Memory Finetuning

Modern language models are powerful, but typically static after deployment. A major obstacle to building models that continually learn over time is catastrophic forgetting, where updating on new data erases previously acquired capabilities.…

Computation and Language · Computer Science 2025-10-20 Jessy Lin , Luke Zettlemoyer , Gargi Ghosh , Wen-Tau Yih , Aram Markosyan , Vincent-Pierre Berges , Barlas Oğuz

Continual Fine-Tuning of Large Language Models via Program Memory

Parameter-Efficient Fine-Tuning (PEFT), particularly Low-Rank Adaptation (LoRA), has become a standard approach for adapting Large Language Models (LLMs) under limited compute. However, in continual settings where models are updated…

Machine Learning · Computer Science 2026-05-14 Hung Le , Svetha Venkatesh

Optimal Singular Damage: Efficient LLM Inference in Low Storage Regimes

Large language models (LLMs) are increasingly prevalent across diverse applications. However, their enormous size limits storage and processing capabilities to a few well-resourced stakeholders. As a result, most applications rely on…

Computation and Language · Computer Science 2025-11-05 Mohammadsajad Alipour , Mohammad Mohammadi Amiri

Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution

Large language models (LLMs) have achieved remarkable success across various tasks but face deployment challenges due to their massive computational demands. While post-training pruning methods like SparseGPT and Wanda can effectively…

Artificial Intelligence · Computer Science 2026-04-21 Qiao Xiao , Alan Ansell , Boqian Wu , Lu Yin , Mykola Pechenizkiy , Shiwei Liu , Decebal Constantin Mocanu

Sparse Fine-tuning for Inference Acceleration of Large Language Models

We consider the problem of accurate sparse fine-tuning of large language models (LLMs), that is, fine-tuning pretrained LLMs on specialized tasks, while inducing sparsity in their weights. On the accuracy side, we observe that standard…

Computation and Language · Computer Science 2023-10-16 Eldar Kurtic , Denis Kuznedelev , Elias Frantar , Michael Goin , Dan Alistarh

Forgetting: A New Mechanism Towards Better Large Language Model Fine-tuning

Supervised fine-tuning (SFT) plays a critical role for pretrained large language models (LLMs), notably enhancing their capacity to acquire domain-specific knowledge while preserving or potentially augmenting their general-purpose…

Machine Learning · Computer Science 2026-03-31 Ali Taheri , Alireza Taban , Qizhou Wang , Shanshan Ye , Abdolreza Mirzaei , Tongliang Liu , Bo Han

HFT: Half Fine-Tuning for Large Language Models

Large language models (LLMs) with one or more fine-tuning phases have become a necessary step to unlock various capabilities, enabling LLMs to follow natural language instructions or align with human preferences. However, it carries the…

Computation and Language · Computer Science 2024-04-30 Tingfeng Hui , Zhenyu Zhang , Shuohuan Wang , Weiran Xu , Yu Sun , Hua Wu

Scaling Sparse Fine-Tuning to Large Language Models

Large Language Models (LLMs) are difficult to fully fine-tune (e.g., with instructions or human feedback) due to their sheer number of parameters. A family of parameter-efficient sparse fine-tuning methods have proven promising in terms of…

Computation and Language · Computer Science 2024-02-05 Alan Ansell , Ivan Vulić , Hannah Sterz , Anna Korhonen , Edoardo M. Ponti

Improved Supervised Fine-Tuning for Large Language Models to Mitigate Catastrophic Forgetting

Supervised Fine-Tuning (SFT) is a critical step for enhancing the instruction-following capabilities of Large Language Models (LLMs) and adapting them to specialized domains. However, SFT often leads to a degradation of the model's general…

Computation and Language · Computer Science 2025-07-01 Fei Ding , Baiqiao Wang

Reversing Large Language Models for Efficient Training and Fine-Tuning

Large Language Models (LLMs) are known for their expensive and time-consuming training. Thus, oftentimes, LLMs are fine-tuned to address a specific task, given the pretrained weights of a pre-trained LLM considered a foundation model. In…

Computation and Language · Computer Science 2025-12-05 Eshed Gal , Moshe Eliasof , Javier Turek , Uri Ascher , Eran Treister , Eldad Haber

Memory Bank Compression for Continual Adaptation of Large Language Models

Large Language Models (LLMs) have become a mainstay for many everyday applications. However, as data evolve their knowledge quickly becomes outdated. Continual learning aims to update LLMs with new information without erasing previously…

Machine Learning · Computer Science 2026-01-05 Thomas Katraouras , Dimitrios Rafailidis

Sparse-RL: Breaking the Memory Wall in LLM Reinforcement Learning via Stable Sparse Rollouts

Reinforcement Learning (RL) has become essential for eliciting complex reasoning capabilities in Large Language Models (LLMs). However, the substantial memory overhead of storing Key-Value (KV) caches during long-horizon rollouts acts as a…

Machine Learning · Computer Science 2026-03-31 Sijia Luo , Xiaokang Zhang , Yuxuan Hu , Bohan Zhang , Ke Wang , Jinbo Su , Mengshu Sun , Lei Liang , Jing Zhang

Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and Benchmarking

Fueled by their remarkable ability to tackle diverse tasks across multiple domains, large language models (LLMs) have grown at an unprecedented rate, with some recent models containing trillions of parameters. This growth is accompanied by…

Machine Learning · Computer Science 2025-05-30 Athanasios Glentis , Jiaxiang Li , Qiulin Shang , Andi Han , Ioannis Tsaknakis , Quan Wei , Mingyi Hong

SparseLLM: Towards Global Pruning for Pre-trained Language Models

The transformative impact of large language models (LLMs) like LLaMA and GPT on natural language processing is countered by their prohibitive computational demands. Pruning has emerged as a pivotal compression strategy, introducing sparsity…

Computation and Language · Computer Science 2024-11-04 Guangji Bai , Yijiang Li , Chen Ling , Kibaek Kim , Liang Zhao

MEMOIR: Lifelong Model Editing with Minimal Overwrite and Informed Retention for LLMs

Language models deployed in real-world systems often require post-hoc updates to incorporate new or corrected knowledge. However, editing such models efficiently and reliably-without retraining or forgetting previous information-remains a…

Computation and Language · Computer Science 2026-02-03 Ke Wang , Yiming Qin , Nikolaos Dimitriadis , Alessandro Favero , Pascal Frossard

MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter

Parameter-Efficient Fine-tuning (PEFT) facilitates the fine-tuning of Large Language Models (LLMs) under limited resources. However, the fine-tuning performance with PEFT on complex, knowledge-intensive tasks is limited due to the…

Computation and Language · Computer Science 2024-06-10 Jitai Hao , WeiWei Sun , Xin Xin , Qi Meng , Zhumin Chen , Pengjie Ren , Zhaochun Ren

Interweaving Memories of a Siamese Large Language Model

Parameter-efficient fine-tuning (PEFT) methods optimize large language models (LLMs) by modifying or introducing a small number of parameters to enhance alignment with downstream tasks. However, they can result in catastrophic forgetting,…

Computation and Language · Computer Science 2024-12-24 Xin Song , Zhikai Xue , Guoxiu He , Jiawei Liu , Wei Lu

STABLE: Gated Continual Learning for Large Language Models

Large language models (LLMs) increasingly require mechanisms for continual adaptation without full retraining. However, sequential updates can lead to catastrophic forgetting, where new edits degrade previously acquired knowledge. This work…

Machine Learning · Computer Science 2025-10-21 William Hoy , Nurcin Celik

Scaling Laws for Forgetting When Fine-Tuning Large Language Models

We study and quantify the problem of forgetting when fine-tuning pre-trained large language models (LLMs) on a downstream task. We find that parameter-efficient fine-tuning (PEFT) strategies, such as Low-Rank Adapters (LoRA), still suffer…

Computation and Language · Computer Science 2024-01-12 Damjan Kalajdzievski