Related papers: Adapting LLMs for Efficient Context Processing thr…

CompLLM: Compression for Long Context Q&A

Large Language Models (LLMs) face significant computational challenges when processing long contexts due to the quadratic complexity of self-attention. While soft context compression methods, which map input text to smaller latent…

Computation and Language · Computer Science 2025-09-24 Gabriele Berton , Jayakrishnan Unnikrishnan , Son Tran , Mubarak Shah

LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

In long context scenarios, large language models (LLMs) face three main challenges: higher computational cost, performance reduction, and position bias. Research indicates that LLM performance hinges on the density and position of key…

Computation and Language · Computer Science 2024-08-13 Huiqiang Jiang , Qianhui Wu , Xufang Luo , Dongsheng Li , Chin-Yew Lin , Yuqing Yang , Lili Qiu

Extending Context Window of Large Language Models via Semantic Compression

Transformer-based Large Language Models (LLMs) often impose limitations on the length of the text input to ensure the generation of fluent and relevant responses. This constraint restricts their applicability in scenarios involving long…

Computation and Language · Computer Science 2023-12-18 Weizhi Fei , Xueyan Niu , Pingyi Zhou , Lu Hou , Bo Bai , Lei Deng , Wei Han

Prompt Compression for Large Language Models: A Survey

Leveraging large language models (LLMs) for complex natural language tasks typically requires long-form prompts to convey detailed requirements and information, which results in increased memory usage and inference costs. To mitigate these…

Computation and Language · Computer Science 2024-10-18 Zongqian Li , Yinhong Liu , Yixuan Su , Nigel Collier

Adapting Language Models to Compress Contexts

Transformer-based language models (LMs) are powerful and widely-applicable tools, but their usefulness is constrained by a finite context window and the expensive computational cost of processing long text documents. We propose to adapt…

Computation and Language · Computer Science 2023-11-07 Alexis Chevalier , Alexander Wettig , Anirudh Ajith , Danqi Chen

LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models

Large language models (LLMs) have been applied in various applications due to their astonishing capabilities. With advancements in technologies such as chain-of-thought (CoT) prompting and in-context learning (ICL), the prompts fed to LLMs…

Computation and Language · Computer Science 2023-12-07 Huiqiang Jiang , Qianhui Wu , Chin-Yew Lin , Yuqing Yang , Lili Qiu

Style-Compress: An LLM-Based Prompt Compression Framework Considering Task-Specific Styles

Prompt compression condenses contexts while maintaining their informativeness for different usage scenarios. It not only shortens the inference time and reduces computational costs during the usage of large language models, but also lowers…

Computation and Language · Computer Science 2024-10-21 Xiao Pu , Tianxing He , Xiaojun Wan

Developing Adaptive Context Compression Techniques for Large Language Models (LLMs) in Long-Running Interactions

Large Language Models (LLMs) often experience performance degradation during long-running interactions due to increasing context length, memory saturation, and computational overhead. This paper presents an adaptive context compression…

Computer Vision and Pattern Recognition · Computer Science 2026-04-01 Payal Fofadiya , Sunil Tiwari

Learning to Compress Prompt in Natural Language Formats

Large language models (LLMs) are great at processing multiple natural language processing tasks, but their abilities are constrained by inferior performance with long context, slow inference speed, and the high cost of computing the…

Computation and Language · Computer Science 2024-04-03 Yu-Neng Chuang , Tianwei Xing , Chia-Yuan Chang , Zirui Liu , Xun Chen , Xia Hu

SCOPE: A Generative Approach for LLM Prompt Compression

Prompt compression methods enhance the efficiency of Large Language Models (LLMs) and minimize the cost by reducing the length of input context. The goal of prompt compression is to shorten the LLM prompt while maintaining a high generation…

Computation and Language · Computer Science 2025-08-25 Tinghui Zhang , Yifan Wang , Daisy Zhe Wang

Dynamic Compressing Prompts for Efficient Inference of Large Language Models

Large Language Models (LLMs) have shown outstanding performance across a variety of tasks, partly due to advanced prompting techniques. However, these techniques often require lengthy prompts, which increase computational costs and can…

Computation and Language · Computer Science 2025-04-16 Jinwu Hu , Wei Zhang , Yufeng Wang , Yu Hu , Bin Xiao , Mingkui Tan , Qing Du

An Empirical Study on Prompt Compression for Large Language Models

Prompt engineering enables Large Language Models (LLMs) to perform a variety of tasks. However, lengthy prompts significantly increase computational complexity and economic costs. To address this issue, we study six prompt compression…

Computation and Language · Computer Science 2025-05-02 Zheng Zhang , Jinyi Li , Yihuai Lan , Xiang Wang , Hao Wang

GMSA: Enhancing Context Compression via Group Merging and Layer Semantic Alignment

Large Language Models (LLMs) have achieved remarkable performance across a wide range of Natural Language Processing (NLP) tasks. However, in long-context scenarios, they face two challenges: high computational cost and information…

Computation and Language · Computer Science 2026-02-10 Jiwei Tang , Zhicheng Zhang , Shunlong Wu , Jingheng Ye , Lichen Bai , Zitai Wang , Tingwei Lu , Lin Hai , Yiming Zhao , Hai-Tao Zheng , Hong-Gee Kim

Compressing Context to Enhance Inference Efficiency of Large Language Models

Large language models (LLMs) achieved remarkable performance across various tasks. However, they face challenges in managing long documents and extended conversations, due to significantly increased computational requirements, both in…

Computation and Language · Computer Science 2023-10-11 Yucheng Li , Bo Dong , Chenghua Lin , Frank Guerin

CompactPrompt: A Unified Pipeline for Prompt Data Compression in LLM Workflows

Large Language Models (LLMs) deliver powerful reasoning and generation capabilities but incur substantial run-time costs when operating in agentic workflows that chain together lengthy prompts and process rich data streams. We introduce…

Artificial Intelligence · Computer Science 2025-10-22 Joong Ho Choi , Jiayang Zhao , Jeel Shah , Ritvika Sonawane , Vedant Singh , Avani Appalla , Will Flanagan , Filipe Condessa

FastLongSpeech: Enhancing Large Speech-Language Models for Efficient Long-Speech Processing

The rapid advancement of Large Language Models (LLMs) has spurred significant progress in Large Speech-Language Models (LSLMs), enhancing their capabilities in both speech understanding and generation. While existing LSLMs often concentrate…

Computation and Language · Computer Science 2025-11-03 Shoutao Guo , Shaolei Zhang , Qingkai Fang , Zhengrui Ma , Min Zhang , Yang Feng

Understanding and Improving Information Preservation in Prompt Compression for LLMs

Recent advancements in large language models (LLMs) have enabled their successful application to a broad range of tasks. However, in information-intensive tasks, the prompt length can grow fast, leading to increased computational…

Computation and Language · Computer Science 2025-10-13 Weronika Łajewska , Momchil Hardalov , Laura Aina , Neha Anna John , Hang Su , Lluís Màrquez

Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression

Large Language Models (LLMs) have garnered widespread attention due to their remarkable performance across various tasks. However, to mitigate the issue of hallucinations, LLMs often incorporate retrieval-augmented pipeline to provide them…

Computation and Language · Computer Science 2024-08-29 Haowen Hou , Fei Ma , Binwen Bai , Xinxin Zhu , Fei Yu

Leveraging Long-Context Large Language Models for Multi-Document Understanding and Summarization in Enterprise Applications

The rapid increase in unstructured data across various fields has made multi-document comprehension and summarization a critical task. Traditional approaches often fail to capture relevant context, maintain logical consistency, and extract…

Computation and Language · Computer Science 2024-09-30 Aditi Godbole , Jabin Geevarghese George , Smita Shandilya

ACON: Optimizing Context Compression for Long-horizon LLM Agents

Large language models (LLMs) are increasingly deployed as agents in dynamic, real-world environments, where success requires both reasoning and effective tool use. A central challenge for agentic tasks is the growing context length, as…

Artificial Intelligence · Computer Science 2025-10-20 Minki Kang , Wei-Ning Chen , Dongge Han , Huseyin A. Inan , Lukas Wutschitz , Yanzhi Chen , Robert Sim , Saravan Rajmohan