Related papers: Concise and Precise Context Compression for Tool-U…

Extending Context Window of Large Language Models via Semantic Compression

Transformer-based Large Language Models (LLMs) often impose limitations on the length of the text input to ensure the generation of fluent and relevant responses. This constraint restricts their applicability in scenarios involving long…

Computation and Language · Computer Science 2023-12-18 Weizhi Fei , Xueyan Niu , Pingyi Zhou , Lu Hou , Bo Bai , Lei Deng , Wei Han

Density-aware Soft Context Compression with Semi-Dynamic Compression Ratio

Soft context compression reduces the computational workload of processing long contexts in LLMs by encoding long context into a smaller number of latent tokens. However, existing frameworks apply uniform compression ratios, failing to…

Computation and Language · Computer Science 2026-03-30 Yijiong Yu , Shuai Yuan , Jie Zheng , Huazheng Wang , Ji Pei

CompLLM: Compression for Long Context Q&A

Large Language Models (LLMs) face significant computational challenges when processing long contexts due to the quadratic complexity of self-attention. While soft context compression methods, which map input text to smaller latent…

Computation and Language · Computer Science 2025-09-24 Gabriele Berton , Jayakrishnan Unnikrishnan , Son Tran , Mubarak Shah

Compressed Context Memory For Online Language Model Interaction

This paper presents a context key/value compression method for Transformer language models in online scenarios, where the context continually expands. As the context lengthens, the attention process demands increasing memory and…

Machine Learning · Computer Science 2024-02-07 Jang-Hyun Kim , Junyoung Yeom , Sangdoo Yun , Hyun Oh Song

Developing Adaptive Context Compression Techniques for Large Language Models (LLMs) in Long-Running Interactions

Large Language Models (LLMs) often experience performance degradation during long-running interactions due to increasing context length, memory saturation, and computational overhead. This paper presents an adaptive context compression…

Computer Vision and Pattern Recognition · Computer Science 2026-04-01 Payal Fofadiya , Sunil Tiwari

Style-Compress: An LLM-Based Prompt Compression Framework Considering Task-Specific Styles

Prompt compression condenses contexts while maintaining their informativeness for different usage scenarios. It not only shortens the inference time and reduces computational costs during the usage of large language models, but also lowers…

Computation and Language · Computer Science 2024-10-21 Xiao Pu , Tianxing He , Xiaojun Wan

Contextual Reinforcement in Multimodal Token Compression for Large Language Models

Effective token compression remains a critical challenge for scaling models to handle increasingly complex and diverse datasets. A novel mechanism based on contextual reinforcement is introduced, dynamically adjusting token importance…

Computation and Language · Computer Science 2025-08-11 Naderdel Piero , Zacharias Cromwell , Nathaniel Wainwright , Matthias Nethercott

A Comprehensive Survey of Compression Algorithms for Language Models

How can we compress language models without sacrificing accuracy? The number of compression algorithms for language models is rapidly growing to benefit from remarkable advances of recent language models without side effects due to the…

Computation and Language · Computer Science 2024-01-30 Seungcheol Park , Jaehyeon Choi , Sojin Lee , U Kang

DAST: Context-Aware Compression in LLMs via Dynamic Allocation of Soft Tokens

Large Language Models (LLMs) face computational inefficiencies and redundant processing when handling long context inputs, prompting a focus on compression techniques. While existing semantic vector-based compression methods achieve…

Computation and Language · Computer Science 2025-02-18 Shaoshen Chen , Yangning Li , Zishan Xu , Yinghui Li , Xin Su , Zifei Shan , Hai-tao Zheng

Recurrent Context Compression: Efficiently Expanding the Context Window of LLM

To extend the context length of Transformer-based large language models (LLMs) and improve comprehension capabilities, we often face limitations due to computational resources and bounded memory storage capacity. This work introduces a…

Computation and Language · Computer Science 2024-06-11 Chensen Huang , Guibo Zhu , Xuepeng Wang , Yifei Luo , Guojing Ge , Haoran Chen , Dong Yi , Jinqiao Wang

On the Effectiveness of Context Compression for Repository-Level Tasks: An Empirical Investigation

Repository-level code intelligence tasks require large language models (LLMs) to process long, multi-file contexts. Such inputs introduce three challenges: crucial context can be obscured by noise, truncated due to limited windows, and…

Software Engineering · Computer Science 2026-04-16 Jia Feng , Zhanyue Qin , Cuiyun Gao , Ruiqi Wang , Chaozheng Wang , Yingwei Ma , Xiaoyuan Xie

On Multilingual Encoder Language Model Compression for Low-Resource Languages

In this paper, we combine two-step knowledge distillation, structured pruning, truncation, and vocabulary trimming for extremely compressing multilingual encoder-only language models for low-resource languages. Our novel approach…

Computation and Language · Computer Science 2025-11-07 Daniil Gurgurov , Michal Gregor , Josef van Genabith , Simon Ostermann

Adapting Language Models to Compress Contexts

Transformer-based language models (LMs) are powerful and widely-applicable tools, but their usefulness is constrained by a finite context window and the expensive computational cost of processing long text documents. We propose to adapt…

Computation and Language · Computer Science 2023-11-07 Alexis Chevalier , Alexander Wettig , Anirudh Ajith , Danqi Chen

Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference

Large language models (LLMs) have triggered a new stream of research focusing on compressing the context length to reduce the computational cost while ensuring the retention of helpful information for LLMs to answer the given question.…

Computation and Language · Computer Science 2024-12-20 Barys Liskavets , Maxim Ushakov , Shuvendu Roy , Mark Klibanov , Ali Etemad , Shane Luke

Compressing Lengthy Context With UltraGist

Compressing lengthy context is a critical but technically challenging problem. In this paper, we propose a new method called UltraGist, which is distinguished for its high-quality compression of lengthy context due to the innovative design…

Computation and Language · Computer Science 2024-10-14 Peitian Zhang , Zheng Liu , Shitao Xiao , Ninglu Shao , Qiwei Ye , Zhicheng Dou

Compress the Context, Keep the Commitments: A Formal Framework for Verifiable LLM Context Compression

LLM context is not just tokens; it is a set of commitments. Long-running conversations accumulate goals, constraints, decisions, preferences, tool results, retrieved evidence, artifacts, and safety boundaries that future responses must…

Machine Learning · Computer Science 2026-05-19 Natalia Trukhina , Vadim Vashkelis

From Context to EDUs: Faithful and Structured Context Compression via Elementary Discourse Unit Decomposition

Managing extensive context remains a critical bottleneck for Large Language Models (LLMs), particularly in applications like long-document question answering and autonomous agents where lengthy inputs incur high computational costs and…

Computation and Language · Computer Science 2026-01-06 Yiqing Zhou , Yu Lei , Shuzheng Si , Qingyan Sun , Wei Wang , Yifei Wu , Hao Wen , Gang Chen , Fanchao Qi , Maosong Sun

CCF: A Context Compression Framework for Efficient Long-Sequence Language Modeling

Scaling language models to longer contexts is essential for capturing rich dependencies across extended discourse. However, na\"ive context extension imposes significant computational and memory burdens, often resulting in inefficiencies…

Computation and Language · Computer Science 2026-02-03 Wenhao Li , Bangcheng Sun , Weihao Ye , Tianyi Zhang , Daohai Yu , Fei Chao , Rongrong Ji

Sentence-Anchored Gist Compression for Long-Context LLMs

This work investigates context compression for Large Language Models (LLMs) using learned compression tokens to reduce the memory and computational demands of processing long sequences. We demonstrate that pre-trained LLMs can be fine-tuned…

Computation and Language · Computer Science 2025-11-12 Dmitrii Tarasov , Elizaveta Goncharova , Kuznetsov Andrey

FocusLLM: Precise Understanding of Long Context by Dynamic Condensing

Empowering LLMs with the ability to precisely understand long contexts is crucial for many downstream applications. However, handling long contexts with conventional transformer architecture requires substantial training and inference…

Computation and Language · Computer Science 2024-12-24 Zhenyu Li , Yike Zhang , Tengyu Pan , Yutao Sun , Zhichao Duan , Junjie Fang , Rong Han , Zixuan Wang , Jianyong Wang