Related papers: Dodo: Dynamic Contextual Compression for Decoder-o…

Large Language Model as Token Compressor and Decompressor

In this paper, we study whether an off-the-shelf LLM can be adapted into a discrete, variable-length token compressor and decompressor for long-context processing. To this end, we design a self-expressive autoencoding framework that…

Computation and Language · Computer Science 2026-05-14 Wenbing Li , Yiran Wang , Zikai Song , Jielei Zhang , Tianhao Zhao , Junkai Lin , Wei Yang

Adapting Language Models to Compress Contexts

Transformer-based language models (LMs) are powerful and widely-applicable tools, but their usefulness is constrained by a finite context window and the expensive computational cost of processing long text documents. We propose to adapt…

Computation and Language · Computer Science 2023-11-07 Alexis Chevalier , Alexander Wettig , Anirudh Ajith , Danqi Chen

DAST: Context-Aware Compression in LLMs via Dynamic Allocation of Soft Tokens

Large Language Models (LLMs) face computational inefficiencies and redundant processing when handling long context inputs, prompting a focus on compression techniques. While existing semantic vector-based compression methods achieve…

Computation and Language · Computer Science 2025-02-18 Shaoshen Chen , Yangning Li , Zishan Xu , Yinghui Li , Xin Su , Zifei Shan , Hai-tao Zheng

LLoCO: Learning Long Contexts Offline

Processing long contexts remains a challenge for large language models (LLMs) due to the quadratic computational and memory overhead of the self-attention mechanism and the substantial KV cache sizes during generation. We propose LLoCO, a…

Computation and Language · Computer Science 2024-10-18 Sijun Tan , Xiuyu Li , Shishir Patil , Ziyang Wu , Tianjun Zhang , Kurt Keutzer , Joseph E. Gonzalez , Raluca Ada Popa

Developing Adaptive Context Compression Techniques for Large Language Models (LLMs) in Long-Running Interactions

Large Language Models (LLMs) often experience performance degradation during long-running interactions due to increasing context length, memory saturation, and computational overhead. This paper presents an adaptive context compression…

Computer Vision and Pattern Recognition · Computer Science 2026-04-01 Payal Fofadiya , Sunil Tiwari

ARC-Encoder: learning compressed text representations for large language models

Recent techniques such as retrieval-augmented generation or chain-of-thought reasoning have led to longer contexts and increased inference costs. Context compression techniques can reduce these costs, but the most effective approaches…

Computation and Language · Computer Science 2025-10-24 Hippolyte Pilchen , Edouard Grave , Patrick Pérez

Sentence-Anchored Gist Compression for Long-Context LLMs

This work investigates context compression for Large Language Models (LLMs) using learned compression tokens to reduce the memory and computational demands of processing long sequences. We demonstrate that pre-trained LLMs can be fine-tuned…

Computation and Language · Computer Science 2025-11-12 Dmitrii Tarasov , Elizaveta Goncharova , Kuznetsov Andrey

LoCoCo: Dropping In Convolutions for Long Context Compression

This paper tackles the memory hurdle of processing long context sequences in Large Language Models (LLMs), by presenting a novel approach, Dropping In Convolutions for Long Context Compression (LoCoCo). LoCoCo employs only a fixed-size…

Machine Learning · Computer Science 2024-10-29 Ruisi Cai , Yuandong Tian , Zhangyang Wang , Beidi Chen

Latent Context Compilation: Distilling Long Context into Compact Portable Memory

Efficient long-context LLM deployment is stalled by a dichotomy between amortized compression, which struggles with out-of-distribution generalization, and Test-Time Training, which incurs prohibitive synthetic data costs and requires…

Machine Learning · Computer Science 2026-02-26 Zeju Li , Yizhou Zhou , Qiang Xu

ATACompressor: Adaptive Task-Aware Compression for Efficient Long-Context Processing in LLMs

Long-context inputs in large language models (LLMs) often suffer from the "lost in the middle" problem, where critical information becomes diluted or ignored due to excessive length. Context compression methods aim to address this by…

Computation and Language · Computer Science 2026-02-04 Xuancheng Li , Haitao Li , Yujia Zhou , Qingyao Ai , Yiqun Liu

Beyond Position Bias: Shifting Context Compression from Position-Driven to Semantic-Driven

Large Language Models (LLMs) have demonstrated exceptional performance across diverse tasks. However, their deployment in long-context scenarios faces high computational overhead and information redundancy. While soft prompt compression has…

Computation and Language · Computer Science 2026-05-12 Jiwei Tang , Zhijing Huang , Xinyu Zhang , Chen Jason Zhang , Jianxing Yu , Libin Zheng , Rui Meng , Jian Yin

Condenser: a Pre-training Architecture for Dense Retrieval

Pre-trained Transformer language models (LM) have become go-to text representation encoders. Prior research fine-tunes deep LMs to encode text sequences such as sentences and passages into single dense vector representations for efficient…

Computation and Language · Computer Science 2021-09-22 Luyu Gao , Jamie Callan

In-Context Former: Lightning-fast Compressing Context for Large Language Model

With the rising popularity of Transformer-based large language models (LLMs), reducing their high inference costs has become a significant research focus. One effective approach is to compress the long input contexts. Existing methods…

Computation and Language · Computer Science 2024-11-06 Xiangfeng Wang , Zaiyi Chen , Zheyong Xie , Tong Xu , Yongyi He , Enhong Chen

CoDAR: Continuous Diffusion Language Models are More Powerful Than You Think

We study why continuous diffusion language models (DLMs) have lagged behind discrete diffusion approaches despite their appealing continuous generative dynamics. Under a controlled token--recovery study, we identify token rounding, the…

Computation and Language · Computer Science 2026-03-04 Junzhe Shen , Jieru Zhao , Ziwei He , Zhouhan Lin

Density-aware Soft Context Compression with Semi-Dynamic Compression Ratio

Soft context compression reduces the computational workload of processing long contexts in LLMs by encoding long context into a smaller number of latent tokens. However, existing frameworks apply uniform compression ratios, failing to…

Computation and Language · Computer Science 2026-03-30 Yijiong Yu , Shuai Yuan , Jie Zheng , Huazheng Wang , Ji Pei

Extending Context Window of Large Language Models via Semantic Compression

Transformer-based Large Language Models (LLMs) often impose limitations on the length of the text input to ensure the generation of fluent and relevant responses. This constraint restricts their applicability in scenarios involving long…

Computation and Language · Computer Science 2023-12-18 Weizhi Fei , Xueyan Niu , Pingyi Zhou , Lu Hou , Bo Bai , Lei Deng , Wei Han

DaMoC: Efficiently Selecting the Optimal Large Language Model for Fine-tuning Domain Tasks Based on Data and Model Compression

Large language models (LLMs) excel in general tasks but struggle with domain-specific ones, requiring fine-tuning with specific data. With many open-source LLMs available, selecting the best model for fine-tuning downstream tasks is…

Computation and Language · Computer Science 2025-09-05 Wei Huang , Huang Wei , Yinggui Wang

From Context to EDUs: Faithful and Structured Context Compression via Elementary Discourse Unit Decomposition

Managing extensive context remains a critical bottleneck for Large Language Models (LLMs), particularly in applications like long-document question answering and autonomous agents where lengthy inputs incur high computational costs and…

Computation and Language · Computer Science 2026-01-06 Yiqing Zhou , Yu Lei , Shuzheng Si , Qingyan Sun , Wei Wang , Yifei Wu , Hao Wen , Gang Chen , Fanchao Qi , Maosong Sun

Retrofitting Large Language Models with Dynamic Tokenization

Current language models (LMs) use a fixed, static subword tokenizer. This default choice typically results in degraded efficiency and language capabilities, especially in languages other than English. To address this issue, we challenge the…

Computation and Language · Computer Science 2025-06-12 Darius Feher , Ivan Vulić , Benjamin Minixhofer

CoDA: A Context-Decoupled Hierarchical Agent with Reinforcement Learning

Large Language Model (LLM) agents trained with reinforcement learning (RL) show great promise for solving complex, multi-step tasks. However, their performance is often crippled by "Context Explosion", where the accumulation of long text…

Computation and Language · Computer Science 2025-12-16 Xuanzhang Liu , Jianglun Feng , Zhuoran Zhuang , Junzhe Zhao , Maofei Que , Jieting Li , Dianlei Wang , Hao Tong , Ye Chen , Pan Li