English
Related papers

Related papers: Compressed Context Memory For Online Language Mode…

200 papers

Large Language Models (LLMs) often experience performance degradation during long-running interactions due to increasing context length, memory saturation, and computational overhead. This paper presents an adaptive context compression…

Computer Vision and Pattern Recognition · Computer Science 2026-04-01 Payal Fofadiya , Sunil Tiwari

To extend the context length of Transformer-based large language models (LLMs) and improve comprehension capabilities, we often face limitations due to computational resources and bounded memory storage capacity. This work introduces a…

Computation and Language · Computer Science 2024-06-11 Chensen Huang , Guibo Zhu , Xuepeng Wang , Yifei Luo , Guojing Ge , Haoran Chen , Dong Yi , Jinqiao Wang

Transformer-based Large Language Models (LLMs) often impose limitations on the length of the text input to ensure the generation of fluent and relevant responses. This constraint restricts their applicability in scenarios involving long…

Computation and Language · Computer Science 2023-12-18 Weizhi Fei , Xueyan Niu , Pingyi Zhou , Lu Hou , Bo Bai , Lei Deng , Wei Han

Large Language Models (LLMs) have made significant strides in natural language processing and generation, yet their ability to handle long-context input remains constrained by the quadratic complexity of attention computation and…

Computation and Language · Computer Science 2025-06-16 Manlai Liang , Wanyi Huang , Mandi Liu , Huaijun Li , Jinlong Li

The quadratic complexity of the attention module makes it gradually become the bulk of compute in Transformer-based LLMs during generation. Moreover, the excessive key-value cache that arises when dealing with long inputs also brings severe…

Computation and Language · Computer Science 2023-10-17 Siyu Ren , Qi Jia , Kenny Q. Zhu

Large Language Models (LLMs) need to adapt to the continuous changes in data, tasks, and user preferences. Due to their massive size and the high costs associated with training, LLMs are not suitable for frequent retraining. However,…

Computation and Language · Computer Science 2024-12-11 Dongfang Li , Zetian Sun , Xinshuo Hu , Baotian Hu , Min Zhang

Scaling language models to longer contexts is essential for capturing rich dependencies across extended discourse. However, na\"ive context extension imposes significant computational and memory burdens, often resulting in inefficiencies…

Computation and Language · Computer Science 2026-02-03 Wenhao Li , Bangcheng Sun , Weihao Ye , Tianyi Zhang , Daohai Yu , Fei Chao , Rongrong Ji

Efficient long-context LLM deployment is stalled by a dichotomy between amortized compression, which struggles with out-of-distribution generalization, and Test-Time Training, which incurs prohibitive synthetic data costs and requires…

Machine Learning · Computer Science 2026-02-26 Zeju Li , Yizhou Zhou , Qiang Xu

With the rising popularity of Transformer-based large language models (LLMs), reducing their high inference costs has become a significant research focus. One effective approach is to compress the long input contexts. Existing methods…

Computation and Language · Computer Science 2024-11-06 Xiangfeng Wang , Zaiyi Chen , Zheyong Xie , Tong Xu , Yongyi He , Enhong Chen

Large Language Models (LLMs) have garnered widespread attention due to their remarkable performance across various tasks. However, to mitigate the issue of hallucinations, LLMs often incorporate retrieval-augmented pipeline to provide them…

Computation and Language · Computer Science 2024-08-29 Haowen Hou , Fei Ma , Binwen Bai , Xinxin Zhu , Fei Yu

Large language models (LLMs) have been applied in various applications due to their astonishing capabilities. With advancements in technologies such as chain-of-thought (CoT) prompting and in-context learning (ICL), the prompts fed to LLMs…

Computation and Language · Computer Science 2023-12-07 Huiqiang Jiang , Qianhui Wu , Chin-Yew Lin , Yuqing Yang , Lili Qiu

Through reading the documentation in the context, tool-using language models can dynamically extend their capability using external tools. The cost is that we have to input lengthy documentation every time the model needs to use the tool,…

Computation and Language · Computer Science 2024-07-03 Yang Xu , Yunlong Feng , Honglin Mu , Yutai Hou , Yitong Li , Xinghao Wang , Wanjun Zhong , Zhongyang Li , Dandan Tu , Qingfu Zhu , Min Zhang , Wanxiang Che

Large language models (LLMs) often rely on user-specific memories distilled from past interactions to enable personalized generation. A common practice is to concatenate these memories with the input prompt, but this approach quickly…

Computation and Language · Computer Science 2026-01-27 Ondrej Bohdal , Pramit Saha , Umberto Michieli , Mete Ozay , Taha Ceritli

Large language models (LLMs) achieved remarkable performance across various tasks. However, they face challenges in managing long documents and extended conversations, due to significantly increased computational requirements, both in…

Computation and Language · Computer Science 2023-10-11 Yucheng Li , Bo Dong , Chenghua Lin , Frank Guerin

Transformer-based language models (LMs) are powerful and widely-applicable tools, but their usefulness is constrained by a finite context window and the expensive computational cost of processing long text documents. We propose to adapt…

Computation and Language · Computer Science 2023-11-07 Alexis Chevalier , Alexander Wettig , Anirudh Ajith , Danqi Chen

Large Language Models (LLMs) face limitations due to the high demand on GPU memory and computational resources when handling long contexts. While sparsify the Key-Value (KV) cache of transformer model is a typical strategy to alleviate…

Machine Learning · Computer Science 2024-02-06 Yumeng Wang , Zhenyang Xiao

Recent large language model applications, such as Retrieval-Augmented Generation and chatbots, have led to an increased need to process longer input contexts. However, this requirement is hampered by inherent limitations. Architecturally,…

Artificial Intelligence · Computer Science 2024-08-14 Giulio Corallo , Paolo Papotti

Prompt compression condenses contexts while maintaining their informativeness for different usage scenarios. It not only shortens the inference time and reduces computational costs during the usage of large language models, but also lowers…

Computation and Language · Computer Science 2024-10-21 Xiao Pu , Tianxing He , Xiaojun Wan

Standard Large Language Models (LLMs) struggle with handling dialogues with long contexts due to efficiency and consistency issues. According to our observation, dialogue contexts are highly structured, and the special token of…

Computation and Language · Computer Science 2024-11-05 Jia-Nan Li , Quan Tu , Cunli Mao , Zhengtao Yu , Ji-Rong Wen , Rui Yan

Neural networks are among the state-of-the-art techniques for language modeling. Existing neural language models typically map discrete words to distributed, dense vector representations. After information processing of the preceding…

Computation and Language · Computer Science 2016-10-14 Yunchuan Chen , Lili Mou , Yan Xu , Ge Li , Zhi Jin
‹ Prev 1 2 3 10 Next ›