Related papers: ATACompressor: Adaptive Task-Aware Compression for…

Developing Adaptive Context Compression Techniques for Large Language Models (LLMs) in Long-Running Interactions

Large Language Models (LLMs) often experience performance degradation during long-running interactions due to increasing context length, memory saturation, and computational overhead. This paper presents an adaptive context compression…

Computer Vision and Pattern Recognition · Computer Science 2026-04-01 Payal Fofadiya , Sunil Tiwari

AttnComp: Attention-Guided Adaptive Context Compression for Retrieval-Augmented Generation

Retrieval-augmented generation improves the factual accuracy of Large Language Models (LLMs) by incorporating external context, but often suffers from irrelevant retrieved content that hinders effectiveness. Context compression addresses…

Computation and Language · Computer Science 2025-09-23 Lvzhou Luo , Yixuan Cao , Ping Luo

ACON: Optimizing Context Compression for Long-horizon LLM Agents

Large language models (LLMs) are increasingly deployed as agents in dynamic, real-world environments, where success requires both reasoning and effective tool use. A central challenge for agentic tasks is the growing context length, as…

Artificial Intelligence · Computer Science 2025-10-20 Minki Kang , Wei-Ning Chen , Dongge Han , Huseyin A. Inan , Lukas Wutschitz , Yanzhi Chen , Robert Sim , Saravan Rajmohan

Long Context Compression with Activation Beacon

Long context compression is a critical research problem due to its significance in reducing the high computational and memory costs associated with LLMs. In this paper, we propose Activation Beacon, a plug-in module for transformer-based…

Computation and Language · Computer Science 2024-10-14 Peitian Zhang , Zheng Liu , Shitao Xiao , Ninglu Shao , Qiwei Ye , Zhicheng Dou

Adapting Language Models to Compress Contexts

Transformer-based language models (LMs) are powerful and widely-applicable tools, but their usefulness is constrained by a finite context window and the expensive computational cost of processing long text documents. We propose to adapt…

Computation and Language · Computer Science 2023-11-07 Alexis Chevalier , Alexander Wettig , Anirudh Ajith , Danqi Chen

Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression

Large Language Models (LLMs) have garnered widespread attention due to their remarkable performance across various tasks. However, to mitigate the issue of hallucinations, LLMs often incorporate retrieval-augmented pipeline to provide them…

Computation and Language · Computer Science 2024-08-29 Haowen Hou , Fei Ma , Binwen Bai , Xinxin Zhu , Fei Yu

Sentence-Anchored Gist Compression for Long-Context LLMs

This work investigates context compression for Large Language Models (LLMs) using learned compression tokens to reduce the memory and computational demands of processing long sequences. We demonstrate that pre-trained LLMs can be fine-tuned…

Computation and Language · Computer Science 2025-11-12 Dmitrii Tarasov , Elizaveta Goncharova , Kuznetsov Andrey

AdmTree: Compressing Lengthy Context with Adaptive Semantic Trees

The quadratic complexity of self-attention constrains Large Language Models (LLMs) in processing long contexts, a capability essential for many advanced applications. Context compression aims to alleviate this computational bottleneck while…

Computation and Language · Computer Science 2025-12-05 Yangning Li , Shaoshen Chen , Yinghui Li , Yankai Chen , Hai-Tao Zheng , Hui Wang , Wenhao Jiang , Philip S. Yu

Perception Compressor: A Training-Free Prompt Compression Framework in Long Context Scenarios

Large language models (LLMs) demonstrate exceptional capabilities in various scenarios. However, they suffer from much redundant information and are sensitive to the position of key information in long context scenarios. To address these…

Computation and Language · Computer Science 2025-02-11 Jiwei Tang , Jin Xu , Tingwei Lu , Zhicheng Zhang , Yiming Zhao , Lin Hai , Hai-Tao Zheng

DAST: Context-Aware Compression in LLMs via Dynamic Allocation of Soft Tokens

Large Language Models (LLMs) face computational inefficiencies and redundant processing when handling long context inputs, prompting a focus on compression techniques. While existing semantic vector-based compression methods achieve…

Computation and Language · Computer Science 2025-02-18 Shaoshen Chen , Yangning Li , Zishan Xu , Yinghui Li , Xin Su , Zifei Shan , Hai-tao Zheng

CompLLM: Compression for Long Context Q&A

Large Language Models (LLMs) face significant computational challenges when processing long contexts due to the quadratic complexity of self-attention. While soft context compression methods, which map input text to smaller latent…

Computation and Language · Computer Science 2025-09-24 Gabriele Berton , Jayakrishnan Unnikrishnan , Son Tran , Mubarak Shah

ARC-Encoder: learning compressed text representations for large language models

Recent techniques such as retrieval-augmented generation or chain-of-thought reasoning have led to longer contexts and increased inference costs. Context compression techniques can reduce these costs, but the most effective approaches…

Computation and Language · Computer Science 2025-10-24 Hippolyte Pilchen , Edouard Grave , Patrick Pérez

Latent Context Compilation: Distilling Long Context into Compact Portable Memory

Efficient long-context LLM deployment is stalled by a dichotomy between amortized compression, which struggles with out-of-distribution generalization, and Test-Time Training, which incurs prohibitive synthetic data costs and requires…

Machine Learning · Computer Science 2026-02-26 Zeju Li , Yizhou Zhou , Qiang Xu

In-Context Former: Lightning-fast Compressing Context for Large Language Model

With the rising popularity of Transformer-based large language models (LLMs), reducing their high inference costs has become a significant research focus. One effective approach is to compress the long input contexts. Existing methods…

Computation and Language · Computer Science 2024-11-06 Xiangfeng Wang , Zaiyi Chen , Zheyong Xie , Tong Xu , Yongyi He , Enhong Chen

Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference

Large language models (LLMs) have triggered a new stream of research focusing on compressing the context length to reduce the computational cost while ensuring the retention of helpful information for LLMs to answer the given question.…

Computation and Language · Computer Science 2024-12-20 Barys Liskavets , Maxim Ushakov , Shuvendu Roy , Mark Klibanov , Ali Etemad , Shane Luke

Thinking as Compression: Your Reasoning Model is Secretly a Context Compressor

Context compression aims to shorten long context inputs with minimal information loss for LLM inference acceleration. While existing methods have shown promise, they typically rely on complex compression modules or compression-specific…

Artificial Intelligence · Computer Science 2026-05-28 Guoxin Ma , Yibing Liu , Chengzhengxu Li , Yu Liang , Yan Wang , Yueyang Zhang , Kecheng Chen , Zhaohan Zhang , Zhiyuan Sun , Daiting Shi

Understanding and Improving Information Preservation in Prompt Compression for LLMs

Recent advancements in large language models (LLMs) have enabled their successful application to a broad range of tasks. However, in information-intensive tasks, the prompt length can grow fast, leading to increased computational…

Computation and Language · Computer Science 2025-10-13 Weronika Łajewska , Momchil Hardalov , Laura Aina , Neha Anna John , Hang Su , Lluís Màrquez

Fix the Structural Bottleneck: Context Compression via Explicit Information Transmission

Long-context LLM agents often struggle with growing token, memory, and latency costs, making efficient context compression essential for practical deployment. Existing LLM-as-a-compressor methods remain noticeably inferior to using the full…

Computation and Language · Computer Science 2026-05-22 Jiangnan Ye , Hanqi Yan , Zhenyi Shen , Heng Chang , Ye Mao , Yulan He

In-context Autoencoder for Context Compression in a Large Language Model

We propose the In-context Autoencoder (ICAE), leveraging the power of a large language model (LLM) to compress a long context into short compact memory slots that can be directly conditioned on by the LLM for various purposes. ICAE is first…

Computation and Language · Computer Science 2024-05-10 Tao Ge , Jing Hu , Lei Wang , Xun Wang , Si-Qing Chen , Furu Wei

Extending Context Window of Large Language Models via Semantic Compression

Transformer-based Large Language Models (LLMs) often impose limitations on the length of the text input to ensure the generation of fluent and relevant responses. This constraint restricts their applicability in scenarios involving long…

Computation and Language · Computer Science 2023-12-18 Weizhi Fei , Xueyan Niu , Pingyi Zhou , Lu Hou , Bo Bai , Lei Deng , Wei Han