Related papers: Efficient Long Context Language Model Retrieval wi…

LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs

While large language models (LLMs) excel in generating coherent and contextually rich outputs, their capacity to efficiently handle long-form contexts is limited by fixed-length position embeddings. Additionally, the computational cost of…

Computation and Language · Computer Science 2025-05-23 Sumin An , Junyoung Sung , Wonpyo Park , Chanjun Park , Paul Hongsuck Seo

Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression

Large Language Models (LLMs) have garnered widespread attention due to their remarkable performance across various tasks. However, to mitigate the issue of hallucinations, LLMs often incorporate retrieval-augmented pipeline to provide them…

Computation and Language · Computer Science 2024-08-29 Haowen Hou , Fei Ma , Binwen Bai , Xinxin Zhu , Fei Yu

Recurrent Context Compression: Efficiently Expanding the Context Window of LLM

To extend the context length of Transformer-based large language models (LLMs) and improve comprehension capabilities, we often face limitations due to computational resources and bounded memory storage capacity. This work introduces a…

Computation and Language · Computer Science 2024-06-11 Chensen Huang , Guibo Zhu , Xuepeng Wang , Yifei Luo , Guojing Ge , Haoran Chen , Dong Yi , Jinqiao Wang

RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation

Retrieving documents and prepending them in-context at inference time improves performance of language model (LMs) on a wide range of tasks. However, these documents, often spanning hundreds of words, make inference substantially more…

Computation and Language · Computer Science 2023-10-09 Fangyuan Xu , Weijia Shi , Eunsol Choi

CompLLM: Compression for Long Context Q&A

Large Language Models (LLMs) face significant computational challenges when processing long contexts due to the quadratic complexity of self-attention. While soft context compression methods, which map input text to smaller latent…

Computation and Language · Computer Science 2025-09-24 Gabriele Berton , Jayakrishnan Unnikrishnan , Son Tran , Mubarak Shah

Extending Context Window of Large Language Models via Semantic Compression

Transformer-based Large Language Models (LLMs) often impose limitations on the length of the text input to ensure the generation of fluent and relevant responses. This constraint restricts their applicability in scenarios involving long…

Computation and Language · Computer Science 2023-12-18 Weizhi Fei , Xueyan Niu , Pingyi Zhou , Lu Hou , Bo Bai , Lei Deng , Wei Han

Developing Adaptive Context Compression Techniques for Large Language Models (LLMs) in Long-Running Interactions

Large Language Models (LLMs) often experience performance degradation during long-running interactions due to increasing context length, memory saturation, and computational overhead. This paper presents an adaptive context compression…

Computer Vision and Pattern Recognition · Computer Science 2026-04-01 Payal Fofadiya , Sunil Tiwari

Compressing Context to Enhance Inference Efficiency of Large Language Models

Large language models (LLMs) achieved remarkable performance across various tasks. However, they face challenges in managing long documents and extended conversations, due to significantly increased computational requirements, both in…

Computation and Language · Computer Science 2023-10-11 Yucheng Li , Bo Dong , Chenghua Lin , Frank Guerin

ILRe: Intermediate Layer Retrieval for Context Compression in Causal Language Models

Large Language Models (LLMs) have demonstrated success across many benchmarks. However, they still exhibit limitations in long-context scenarios, primarily due to their short effective context length, quadratic computational complexity, and…

Computation and Language · Computer Science 2025-09-26 Manlai Liang , Mandi Liu , Jiangzhou Ji , Huaijun Li , Haobo Yang , Yaohan He , Jinlong Li

Lossless Prompt Compression via Dictionary-Encoding and In-Context Learning: Enabling Cost-Effective LLM Analysis of Repetitive Data

In-context learning has established itself as an important learning paradigm for Large Language Models (LLMs). In this paper, we demonstrate that LLMs can learn encoding keys in-context and perform analysis directly on encoded…

Computation and Language · Computer Science 2026-04-16 Andresa Rodrigues de Campos , David Lee , Imry Kissos , Piyush Paritosh

Perception Compressor: A Training-Free Prompt Compression Framework in Long Context Scenarios

Large language models (LLMs) demonstrate exceptional capabilities in various scenarios. However, they suffer from much redundant information and are sensitive to the position of key information in long context scenarios. To address these…

Computation and Language · Computer Science 2025-02-11 Jiwei Tang , Jin Xu , Tingwei Lu , Zhicheng Zhang , Yiming Zhao , Lin Hai , Hai-Tao Zheng

In-Context Former: Lightning-fast Compressing Context for Large Language Model

With the rising popularity of Transformer-based large language models (LLMs), reducing their high inference costs has become a significant research focus. One effective approach is to compress the long input contexts. Existing methods…

Computation and Language · Computer Science 2024-11-06 Xiangfeng Wang , Zaiyi Chen , Zheyong Xie , Tong Xu , Yongyi He , Enhong Chen

Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models

Recent advancements in long-context language models (LCLMs) promise to transform Retrieval-Augmented Generation (RAG) by simplifying pipelines. With their expanded context windows, LCLMs can process entire knowledge bases and perform…

Computation and Language · Computer Science 2025-06-10 Yifu Qiu , Varun Embar , Yizhe Zhang , Navdeep Jaitly , Shay B. Cohen , Benjamin Han

On the Effectiveness of Context Compression for Repository-Level Tasks: An Empirical Investigation

Repository-level code intelligence tasks require large language models (LLMs) to process long, multi-file contexts. Such inputs introduce three challenges: crucial context can be obscured by noise, truncated due to limited windows, and…

Software Engineering · Computer Science 2026-04-16 Jia Feng , Zhanyue Qin , Cuiyun Gao , Ruiqi Wang , Chaozheng Wang , Yingwei Ma , Xiaoyuan Xie

LLM2IR: simple unsupervised contrastive learning makes long-context LLM great retriever

Modern dense information retrieval (IR) models usually rely on costly large-scale pretraining. In this paper, we introduce LLM2IR, an efficient unsupervised contrastive learning framework to convert any decoder-only large language model…

Information Retrieval · Computer Science 2026-01-12 Xiaocong Yang

Can't Remember Details in Long Documents? You Need Some R&R

Long-context large language models (LLMs) hold promise for tasks such as question-answering (QA) over long documents, but they tend to miss important information in the middle of context documents (arXiv:2307.03172v3). Here, we introduce…

Computation and Language · Computer Science 2024-03-11 Devanshu Agrawal , Shang Gao , Martin Gajek

LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models

Large language models (LLMs) have been applied in various applications due to their astonishing capabilities. With advancements in technologies such as chain-of-thought (CoT) prompting and in-context learning (ICL), the prompts fed to LLMs…

Computation and Language · Computer Science 2023-12-07 Huiqiang Jiang , Qianhui Wu , Chin-Yew Lin , Yuqing Yang , Lili Qiu

LLoCO: Learning Long Contexts Offline

Processing long contexts remains a challenge for large language models (LLMs) due to the quadratic computational and memory overhead of the self-attention mechanism and the substantial KV cache sizes during generation. We propose LLoCO, a…

Computation and Language · Computer Science 2024-10-18 Sijun Tan , Xiuyu Li , Shishir Patil , Ziyang Wu , Tianjun Zhang , Kurt Keutzer , Joseph E. Gonzalez , Raluca Ada Popa

Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains

Large Language Models (LLMs) achieve superior performance through Chain-of-Thought (CoT) reasoning, but these token-level reasoning chains are computationally expensive and inefficient. In this paper, we introduce Compressed Latent…

Computation and Language · Computer Science 2026-02-04 Wenhui Tan , Jiaze Li , Jianzhong Ju , Zhenbo Luo , Ruihua Song , Jian Luan

Learning Contextual Retrieval for Robust Conversational Search

Effective conversational search demands a deep understanding of user intent across multiple dialogue turns. Users frequently use abbreviations and shift topics in the middle of conversations, posing challenges for conventional retrievers.…

Information Retrieval · Computer Science 2025-09-25 Seunghan Yang , Juntae Lee , Jihwan Bang , Kyuhong Shim , Minsoo Kim , Simyung Chang