Related papers: Do Compressed LLMs Forget Knowledge? An Experiment…

Understanding and Improving Information Preservation in Prompt Compression for LLMs

Recent advancements in large language models (LLMs) have enabled their successful application to a broad range of tasks. However, in information-intensive tasks, the prompt length can grow fast, leading to increased computational…

Computation and Language · Computer Science 2025-10-13 Weronika Łajewska , Momchil Hardalov , Laura Aina , Neha Anna John , Hang Su , Lluís Màrquez

LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models

Large language models (LLMs) have been applied in various applications due to their astonishing capabilities. With advancements in technologies such as chain-of-thought (CoT) prompting and in-context learning (ICL), the prompts fed to LLMs…

Computation and Language · Computer Science 2023-12-07 Huiqiang Jiang , Qianhui Wu , Chin-Yew Lin , Yuqing Yang , Lili Qiu

Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt

While the numerous parameters in Large Language Models (LLMs) contribute to their superior performance, this massive scale makes them inefficient and memory-hungry. Thus, they are hard to deploy on commodity hardware, such as one single…

Computation and Language · Computer Science 2023-10-11 Zhaozhuo Xu , Zirui Liu , Beidi Chen , Yuxin Tang , Jue Wang , Kaixiong Zhou , Xia Hu , Anshumali Shrivastava

Can Prompts Rewind Time for LLMs? Evaluating the Effectiveness of Prompted Knowledge Cutoffs

Large Language Models (LLMs) are widely used for temporal prediction, but their reliance on pretraining data raises contamination concerns, as accurate predictions on pre-cutoff test data may reflect memorization rather than reasoning,…

Computation and Language · Computer Science 2025-10-16 Xin Gao , Ruiyi Zhang , Daniel Du , Saurabh Mahindre , Sai Ashish Somayajula , Pengtao Xie

An Empirical Study on Prompt Compression for Large Language Models

Prompt engineering enables Large Language Models (LLMs) to perform a variety of tasks. However, lengthy prompts significantly increase computational complexity and economic costs. To address this issue, we study six prompt compression…

Computation and Language · Computer Science 2025-05-02 Zheng Zhang , Jinyi Li , Yihuai Lan , Xiang Wang , Hao Wang

Understanding the Dilemma of Unlearning for Large Language Models

Unlearning seeks to remove specific knowledge from large language models (LLMs), but its effectiveness remains contested. On one side, "forgotten" knowledge can often be recovered through interventions such as light fine-tuning; on the…

Computation and Language · Computer Science 2025-09-30 Qingjie Zhang , Haoting Qian , Zhicong Huang , Cheng Hong , Minlie Huang , Ke Xu , Chao Zhang , Han Qiu

Learning is Forgetting: LLM Training As Lossy Compression

Despite the increasing prevalence of large language models (LLMs), we still have a limited understanding of how their representational spaces are structured. This limits our ability to interpret how and what they learn or relate them to…

Machine Learning · Computer Science 2026-04-10 Henry C. Conklin , Tom Hosking , Tan Yi-Chern , Julian Gold , Jonathan D. Cohen , Thomas L. Griffiths , Max Bartolo , Seraphina Goldfarb-Tarrant

Prompt Compression in Diffusion Large Language Models: Evaluating LLMLingua-2 on LLaDA

Prompt compression reduces inference cost and context length in large language models, but prior evaluations focus primarily on autoregressive architectures. This study investigates whether prompt compression transfers effectively to…

Computation and Language · Computer Science 2026-05-19 Sterling Huang , Abigayle Brown , Jiyoo Noh , Jiakang Xu , Wantong Huo , Kaung Myat Kyaw , Jonathan Chan

LINKED: Eliciting, Filtering and Integrating Knowledge in Large Language Model for Commonsense Reasoning

Large language models (LLMs) sometimes demonstrate poor performance on knowledge-intensive tasks, commonsense reasoning is one of them. Researchers typically address these issues by retrieving related knowledge from knowledge graphs or…

Computation and Language · Computer Science 2024-10-15 Jiachun Li , Pengfei Cao , Chenhao Wang , Zhuoran Jin , Yubo Chen , Kang Liu , Xiaojian Jiang , Jiexin Xu , Jun Zhao

Comparing Knowledge Injection Methods for LLMs in a Low-Resource Regime

Large language models (LLMs) often require vast amounts of text to effectively acquire new knowledge. While continuing pre-training on large corpora or employing retrieval-augmented generation (RAG) has proven successful, updating an LLM…

Computation and Language · Computer Science 2025-08-11 Hugo Abonizio , Thales Almeida , Roberto Lotufo , Rodrigo Nogueira

Compression Represents Intelligence Linearly

There is a belief that learning to compress well will lead to intelligence. Recently, language modeling has been shown to be equivalent to compression, which offers a compelling rationale for the success of large language models (LLMs): the…

Computation and Language · Computer Science 2024-08-20 Yuzhen Huang , Jinghan Zhang , Zifei Shan , Junxian He

Prompt Compression in the Wild: Measuring Latency, Rate Adherence, and Quality for Faster LLM Inference

With the wide adoption of language models for IR -- and specifically RAG systems -- the latency of the underlying LLM becomes a crucial bottleneck, since the long contexts of retrieved passages lead large prompts and therefore, compute…

Information Retrieval · Computer Science 2026-04-06 Cornelius Kummer , Lena Jurkschat , Michael Färber , Sahar Vahdati

Compression Laws for Large Language Models

We introduce compression laws for language language models (LLMs). While recent scaling laws have sought to understand how LLMs scale with respect to model size, pre-training data, and computational resources, we focus on understanding how…

Computation and Language · Computer Science 2025-04-08 Ayan Sengupta , Siddhant Chaudhary , Tanmoy Chakraborty

CMT: A Memory Compression Method for Continual Knowledge Learning of Large Language Models

Large Language Models (LLMs) need to adapt to the continuous changes in data, tasks, and user preferences. Due to their massive size and the high costs associated with training, LLMs are not suitable for frequent retraining. However,…

Computation and Language · Computer Science 2024-12-11 Dongfang Li , Zetian Sun , Xinshuo Hu , Baotian Hu , Min Zhang

Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression

Large Language Models (LLMs) have garnered widespread attention due to their remarkable performance across various tasks. However, to mitigate the issue of hallucinations, LLMs often incorporate retrieval-augmented pipeline to provide them…

Computation and Language · Computer Science 2024-08-29 Haowen Hou , Fei Ma , Binwen Bai , Xinxin Zhu , Fei Yu

When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models

Compression methods, including quantization, distillation, and pruning, improve the computational efficiency of large reasoning models (LRMs). However, existing studies either fail to sufficiently compare all three compression methods on…

Machine Learning · Computer Science 2026-03-03 Nan Zhang , Eugene Kwek , Yusen Zhang , Ngoc-Hieu Nguyen , Prasenjit Mitra , Rui Zhang

Enhancing Knowledge Distillation for LLMs with Response-Priming Prompting

Large language models (LLMs) have demonstrated remarkable performance across a wide range of natural language processing (NLP) tasks. However, these models are often difficult to deploy due to significant computational requirements and…

Computation and Language · Computer Science 2024-12-25 Vijay Goyal , Mustafa Khan , Aprameya Tirupati , Harveer Saini , Michael Lam , Kevin Zhu

Prompt Compression for Large Language Models: A Survey

Leveraging large language models (LLMs) for complex natural language tasks typically requires long-form prompts to convey detailed requirements and information, which results in increased memory usage and inference costs. To mitigate these…

Computation and Language · Computer Science 2024-10-18 Zongqian Li , Yinhong Liu , Yixuan Su , Nigel Collier

Unlocking Memorization in Large Language Models with Dynamic Soft Prompting

Pretrained large language models (LLMs) have revolutionized natural language processing (NLP) tasks such as summarization, question answering, and translation. However, LLMs pose significant security risks due to their tendency to memorize…

Computation and Language · Computer Science 2024-09-24 Zhepeng Wang , Runxue Bao , Yawen Wu , Jackson Taylor , Cao Xiao , Feng Zheng , Weiwen Jiang , Shangqian Gao , Yanfu Zhang

Compressed-Sensing-Guided, Inference-Aware Structured Reduction for Large Language Models

Large language models deliver strong generative performance but at the cost of massive parameter counts, memory use, and decoding latency. Prior work has shown that pruning and structured sparsity can preserve accuracy under substantial…

Computation and Language · Computer Science 2026-04-17 Andrew Kiruluta