English
Related papers

Related papers: Dodo: Dynamic Contextual Compression for Decoder-o…

200 papers

In this paper, we study whether an off-the-shelf LLM can be adapted into a discrete, variable-length token compressor and decompressor for long-context processing. To this end, we design a self-expressive autoencoding framework that…

Computation and Language · Computer Science 2026-05-14 Wenbing Li , Yiran Wang , Zikai Song , Jielei Zhang , Tianhao Zhao , Junkai Lin , Wei Yang

Transformer-based language models (LMs) are powerful and widely-applicable tools, but their usefulness is constrained by a finite context window and the expensive computational cost of processing long text documents. We propose to adapt…

Computation and Language · Computer Science 2023-11-07 Alexis Chevalier , Alexander Wettig , Anirudh Ajith , Danqi Chen

Large Language Models (LLMs) face computational inefficiencies and redundant processing when handling long context inputs, prompting a focus on compression techniques. While existing semantic vector-based compression methods achieve…

Computation and Language · Computer Science 2025-02-18 Shaoshen Chen , Yangning Li , Zishan Xu , Yinghui Li , Xin Su , Zifei Shan , Hai-tao Zheng

Processing long contexts remains a challenge for large language models (LLMs) due to the quadratic computational and memory overhead of the self-attention mechanism and the substantial KV cache sizes during generation. We propose LLoCO, a…

Computation and Language · Computer Science 2024-10-18 Sijun Tan , Xiuyu Li , Shishir Patil , Ziyang Wu , Tianjun Zhang , Kurt Keutzer , Joseph E. Gonzalez , Raluca Ada Popa

Large Language Models (LLMs) often experience performance degradation during long-running interactions due to increasing context length, memory saturation, and computational overhead. This paper presents an adaptive context compression…

Computer Vision and Pattern Recognition · Computer Science 2026-04-01 Payal Fofadiya , Sunil Tiwari

Recent techniques such as retrieval-augmented generation or chain-of-thought reasoning have led to longer contexts and increased inference costs. Context compression techniques can reduce these costs, but the most effective approaches…

Computation and Language · Computer Science 2025-10-24 Hippolyte Pilchen , Edouard Grave , Patrick Pérez

This work investigates context compression for Large Language Models (LLMs) using learned compression tokens to reduce the memory and computational demands of processing long sequences. We demonstrate that pre-trained LLMs can be fine-tuned…

Computation and Language · Computer Science 2025-11-12 Dmitrii Tarasov , Elizaveta Goncharova , Kuznetsov Andrey

This paper tackles the memory hurdle of processing long context sequences in Large Language Models (LLMs), by presenting a novel approach, Dropping In Convolutions for Long Context Compression (LoCoCo). LoCoCo employs only a fixed-size…

Machine Learning · Computer Science 2024-10-29 Ruisi Cai , Yuandong Tian , Zhangyang Wang , Beidi Chen

Efficient long-context LLM deployment is stalled by a dichotomy between amortized compression, which struggles with out-of-distribution generalization, and Test-Time Training, which incurs prohibitive synthetic data costs and requires…

Machine Learning · Computer Science 2026-02-26 Zeju Li , Yizhou Zhou , Qiang Xu

Long-context inputs in large language models (LLMs) often suffer from the "lost in the middle" problem, where critical information becomes diluted or ignored due to excessive length. Context compression methods aim to address this by…

Computation and Language · Computer Science 2026-02-04 Xuancheng Li , Haitao Li , Yujia Zhou , Qingyao Ai , Yiqun Liu

Large Language Models (LLMs) have demonstrated exceptional performance across diverse tasks. However, their deployment in long-context scenarios faces high computational overhead and information redundancy. While soft prompt compression has…

Computation and Language · Computer Science 2026-05-12 Jiwei Tang , Zhijing Huang , Xinyu Zhang , Chen Jason Zhang , Jianxing Yu , Libin Zheng , Rui Meng , Jian Yin

Pre-trained Transformer language models (LM) have become go-to text representation encoders. Prior research fine-tunes deep LMs to encode text sequences such as sentences and passages into single dense vector representations for efficient…

Computation and Language · Computer Science 2021-09-22 Luyu Gao , Jamie Callan

With the rising popularity of Transformer-based large language models (LLMs), reducing their high inference costs has become a significant research focus. One effective approach is to compress the long input contexts. Existing methods…

Computation and Language · Computer Science 2024-11-06 Xiangfeng Wang , Zaiyi Chen , Zheyong Xie , Tong Xu , Yongyi He , Enhong Chen

We study why continuous diffusion language models (DLMs) have lagged behind discrete diffusion approaches despite their appealing continuous generative dynamics. Under a controlled token--recovery study, we identify token rounding, the…

Computation and Language · Computer Science 2026-03-04 Junzhe Shen , Jieru Zhao , Ziwei He , Zhouhan Lin

Soft context compression reduces the computational workload of processing long contexts in LLMs by encoding long context into a smaller number of latent tokens. However, existing frameworks apply uniform compression ratios, failing to…

Computation and Language · Computer Science 2026-03-30 Yijiong Yu , Shuai Yuan , Jie Zheng , Huazheng Wang , Ji Pei

Transformer-based Large Language Models (LLMs) often impose limitations on the length of the text input to ensure the generation of fluent and relevant responses. This constraint restricts their applicability in scenarios involving long…

Computation and Language · Computer Science 2023-12-18 Weizhi Fei , Xueyan Niu , Pingyi Zhou , Lu Hou , Bo Bai , Lei Deng , Wei Han

Large language models (LLMs) excel in general tasks but struggle with domain-specific ones, requiring fine-tuning with specific data. With many open-source LLMs available, selecting the best model for fine-tuning downstream tasks is…

Computation and Language · Computer Science 2025-09-05 Wei Huang , Huang Wei , Yinggui Wang

Managing extensive context remains a critical bottleneck for Large Language Models (LLMs), particularly in applications like long-document question answering and autonomous agents where lengthy inputs incur high computational costs and…

Computation and Language · Computer Science 2026-01-06 Yiqing Zhou , Yu Lei , Shuzheng Si , Qingyan Sun , Wei Wang , Yifei Wu , Hao Wen , Gang Chen , Fanchao Qi , Maosong Sun

Current language models (LMs) use a fixed, static subword tokenizer. This default choice typically results in degraded efficiency and language capabilities, especially in languages other than English. To address this issue, we challenge the…

Computation and Language · Computer Science 2025-06-12 Darius Feher , Ivan Vulić , Benjamin Minixhofer

Large Language Model (LLM) agents trained with reinforcement learning (RL) show great promise for solving complex, multi-step tasks. However, their performance is often crippled by "Context Explosion", where the accumulation of long text…

Computation and Language · Computer Science 2025-12-16 Xuanzhang Liu , Jianglun Feng , Zhuoran Zhuang , Junzhe Zhao , Maofei Que , Jieting Li , Dianlei Wang , Hao Tong , Ye Chen , Pan Li
‹ Prev 1 2 3 10 Next ›