Related papers: CCF: A Context Compression Framework for Efficient…

Developing Adaptive Context Compression Techniques for Large Language Models (LLMs) in Long-Running Interactions

Large Language Models (LLMs) often experience performance degradation during long-running interactions due to increasing context length, memory saturation, and computational overhead. This paper presents an adaptive context compression…

Computer Vision and Pattern Recognition · Computer Science 2026-04-01 Payal Fofadiya , Sunil Tiwari

Contextual Compression Encoding for Large Language Models: A Novel Framework for Multi-Layered Parameter Space Pruning

Context-aware compression techniques have gained increasing attention as model sizes continue to grow, introducing computational bottlenecks that hinder efficient deployment. A structured encoding approach was proposed to selectively…

Computation and Language · Computer Science 2025-02-13 Barnaby Schmitt , Alistair Grosvenor , Matthias Cunningham , Clementine Walsh , Julius Pembrokeshire , Jonathan Teel

Recurrent Context Compression: Efficiently Expanding the Context Window of LLM

To extend the context length of Transformer-based large language models (LLMs) and improve comprehension capabilities, we often face limitations due to computational resources and bounded memory storage capacity. This work introduces a…

Computation and Language · Computer Science 2024-06-11 Chensen Huang , Guibo Zhu , Xuepeng Wang , Yifei Luo , Guojing Ge , Haoran Chen , Dong Yi , Jinqiao Wang

Compressing Context to Enhance Inference Efficiency of Large Language Models

Large language models (LLMs) achieved remarkable performance across various tasks. However, they face challenges in managing long documents and extended conversations, due to significantly increased computational requirements, both in…

Computation and Language · Computer Science 2023-10-11 Yucheng Li , Bo Dong , Chenghua Lin , Frank Guerin

Extending Context Window of Large Language Models via Semantic Compression

Transformer-based Large Language Models (LLMs) often impose limitations on the length of the text input to ensure the generation of fluent and relevant responses. This constraint restricts their applicability in scenarios involving long…

Computation and Language · Computer Science 2023-12-18 Weizhi Fei , Xueyan Niu , Pingyi Zhou , Lu Hou , Bo Bai , Lei Deng , Wei Han

Contextual Reinforcement in Multimodal Token Compression for Large Language Models

Effective token compression remains a critical challenge for scaling models to handle increasingly complex and diverse datasets. A novel mechanism based on contextual reinforcement is introduced, dynamically adjusting token importance…

Computation and Language · Computer Science 2025-08-11 Naderdel Piero , Zacharias Cromwell , Nathaniel Wainwright , Matthias Nethercott

Compressed Context Memory For Online Language Model Interaction

This paper presents a context key/value compression method for Transformer language models in online scenarios, where the context continually expands. As the context lengthens, the attention process demands increasing memory and…

Machine Learning · Computer Science 2024-02-07 Jang-Hyun Kim , Junyoung Yeom , Sangdoo Yun , Hyun Oh Song

In-Context Former: Lightning-fast Compressing Context for Large Language Model

With the rising popularity of Transformer-based large language models (LLMs), reducing their high inference costs has become a significant research focus. One effective approach is to compress the long input contexts. Existing methods…

Computation and Language · Computer Science 2024-11-06 Xiangfeng Wang , Zaiyi Chen , Zheyong Xie , Tong Xu , Yongyi He , Enhong Chen

Neural Contextual Reinforcement Framework for Logical Structure Language Generation

The Neural Contextual Reinforcement Framework introduces an innovative approach to enhancing the logical coherence and structural consistency of text generated by large language models. Leveraging reinforcement learning principles, the…

Computation and Language · Computer Science 2025-08-11 Marcus Irvin , William Cooper , Edward Hughes , Jessica Morgan , Christopher Hamilton

CacheFormer: High Attention-Based Segment Caching

Efficiently handling long contexts in transformer-based language models with low perplexity is an active area of research. Numerous recent approaches like Linformer, Longformer, Performer, and Structured state space models (SSMs)., have not…

Machine Learning · Computer Science 2025-04-22 Sushant Singh , Ausif Mahmood

Contextual Memory Reweaving in Large Language Models Using Layered Latent State Reconstruction

Memory retention challenges in deep neural architectures have ongoing limitations in the ability to process and recall extended contextual information. Token dependencies degrade as sequence length increases, leading to a decline in…

Computation and Language · Computer Science 2025-03-26 Frederick Dillon , Gregor Halvorsen , Simon Tattershall , Magnus Rowntree , Gareth Vanderpool

CompLLM: Compression for Long Context Q&A

Large Language Models (LLMs) face significant computational challenges when processing long contexts due to the quadratic complexity of self-attention. While soft context compression methods, which map input text to smaller latent…

Computation and Language · Computer Science 2025-09-24 Gabriele Berton , Jayakrishnan Unnikrishnan , Son Tran , Mubarak Shah

Perception Compressor: A Training-Free Prompt Compression Framework in Long Context Scenarios

Large language models (LLMs) demonstrate exceptional capabilities in various scenarios. However, they suffer from much redundant information and are sensitive to the position of key information in long context scenarios. To address these…

Computation and Language · Computer Science 2025-02-11 Jiwei Tang , Jin Xu , Tingwei Lu , Zhicheng Zhang , Yiming Zhao , Lin Hai , Hai-Tao Zheng

Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference

Large language models (LLMs) have triggered a new stream of research focusing on compressing the context length to reduce the computational cost while ensuring the retention of helpful information for LLMs to answer the given question.…

Computation and Language · Computer Science 2024-12-20 Barys Liskavets , Maxim Ushakov , Shuvendu Roy , Mark Klibanov , Ali Etemad , Shane Luke

Augmenting Language Models with Long-Term Memory

Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models…

Computation and Language · Computer Science 2023-06-13 Weizhi Wang , Li Dong , Hao Cheng , Xiaodong Liu , Xifeng Yan , Jianfeng Gao , Furu Wei

Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression

Large Language Models (LLMs) have garnered widespread attention due to their remarkable performance across various tasks. However, to mitigate the issue of hallucinations, LLMs often incorporate retrieval-augmented pipeline to provide them…

Computation and Language · Computer Science 2024-08-29 Haowen Hou , Fei Ma , Binwen Bai , Xinxin Zhu , Fei Yu

LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs

While large language models (LLMs) excel in generating coherent and contextually rich outputs, their capacity to efficiently handle long-form contexts is limited by fixed-length position embeddings. Additionally, the computational cost of…

Computation and Language · Computer Science 2025-05-23 Sumin An , Junyoung Sung , Wonpyo Park , Chanjun Park , Paul Hongsuck Seo

Corner-to-Center Long-range Context Model for Efficient Learned Image Compression

In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations. To reduce the decoding time resulting from the serial autoregressive context model, the…

Image and Video Processing · Electrical Eng. & Systems 2023-12-01 Yang Sui , Ding Ding , Xiang Pan , Xiaozhong Xu , Shan Liu , Bo Yuan , Zhenzhong Chen

Dynamic Long Context Reasoning over Compressed Memory via End-to-End Reinforcement Learning

Large Language Models (LLMs) face significant challenges in long-context processing, including quadratic computational costs, information forgetting, and the context fragmentation inherent in retrieval-augmented generation (RAG). We propose…

Computation and Language · Computer Science 2026-02-10 Zhuoen Chen , Dongfang Li , Meishan Zhang , Baotian Hu , Min Zhang

On the Effectiveness of Context Compression for Repository-Level Tasks: An Empirical Investigation

Repository-level code intelligence tasks require large language models (LLMs) to process long, multi-file contexts. Such inputs introduce three challenges: crucial context can be obscured by noise, truncated due to limited windows, and…

Software Engineering · Computer Science 2026-04-16 Jia Feng , Zhanyue Qin , Cuiyun Gao , Ruiqi Wang , Chaozheng Wang , Yingwei Ma , Xiaoyuan Xie