English
Related papers

Related papers: Neurocache: Efficient Vector Retrieval for Long-ra…

200 papers

A critical approach for efficiently deploying computationally demanding large language models (LLMs) is Key-Value (KV) caching. The KV cache stores key-value states of previously generated tokens, significantly reducing the need for…

Computation and Language · Computer Science 2024-09-10 Akide Liu , Jing Liu , Zizheng Pan , Yefei He , Gholamreza Haffari , Bohan Zhuang

We propose an extension to neural network language models to adapt their prediction to the recent history. Our model is a simplified version of memory augmented networks, which stores past hidden activations as memory and accesses them…

Computation and Language · Computer Science 2016-12-15 Edouard Grave , Armand Joulin , Nicolas Usunier

Recent advancements in Large Language Models (LLMs) have yielded remarkable success across diverse fields. However, handling long contexts remains a significant challenge for LLMs due to the quadratic time and space complexity of attention…

Computation and Language · Computer Science 2024-09-02 Weijie Liu , Zecheng Tang , Juntao Li , Kehai Chen , Min Zhang

Large Language Models(LLMs) have had a profound impact on AI applications, particularly in the domains of long-text comprehension and generation. KV Cache technology is one of the most widely used techniques in the industry. It ensures…

Computation and Language · Computer Science 2024-04-30 Qiaozhi He , Zhihua Wu

Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models…

Computation and Language · Computer Science 2023-06-13 Weizhi Wang , Li Dong , Hao Cheng , Xiaodong Liu , Xifeng Yan , Jianfeng Gao , Furu Wei

Recently, continuous cache models were proposed as extensions to recurrent neural network language models, to adapt their predictions to local changes in the data distribution. These models only capture the local context, of up to a few…

Machine Learning · Computer Science 2017-11-08 Edouard Grave , Moustapha Cisse , Armand Joulin

A cache-inspired approach is proposed for neural language models (LMs) to improve long-range dependency and better predict rare words from long contexts. This approach is a simpler alternative to attention-based pointer mechanism that…

Audio and Speech Processing · Electrical Eng. & Systems 2020-09-30 Ke Li , Daniel Povey , Sanjeev Khudanpur

Language models typically need to be trained or finetuned in order to acquire new knowledge, which involves updating their weights. We instead envision language models that can simply read and memorize new data at inference time, thus…

Machine Learning · Computer Science 2022-03-18 Yuhuai Wu , Markus N. Rabe , DeLesley Hutchins , Christian Szegedy

Transformer-based large language models (LLMs) have already achieved remarkable results on long-text tasks, but the limited GPU memory (VRAM) resources struggle to accommodate the linearly growing demand for key-value (KV) cache as the…

Computation and Language · Computer Science 2025-03-21 Shibo Jie , Yehui Tang , Kai Han , Zhi-Hong Deng , Jing Han

Recent advancements in Large Language Models (LLMs) have spurred interest in numerous applications requiring robust long-range capabilities, essential for processing extensive input contexts and continuously generating extended outputs. As…

Machine Learning · Computer Science 2025-07-22 Dachuan Shi , Yonggan Fu , Xiangchi Yuan , Zhongzhi Yu , Haoran You , Sixu Li , Xin Dong , Jan Kautz , Pavlo Molchanov , Yingyan , Lin

Current large language models (LLMs) often perform poorly on simple fact retrieval tasks. Here we investigate if coupling a dynamically adaptable external memory to a LLM can alleviate this problem. For this purpose, we test Larimar, a…

Computation and Language · Computer Science 2024-07-15 Elliot Nelson , Georgios Kollias , Payel Das , Subhajit Chaudhury , Soham Dan

Semantic caching significantly reduces computational costs and improves efficiency by storing and reusing large language model (LLM) responses. However, existing systems rely primarily on matching individual queries, lacking awareness of…

Computation and Language · Computer Science 2025-07-16 Jianxin Yan , Wangze Ni , Lei Chen , Xuemin Lin , Peng Cheng , Zhan Qin , Kui Ren

Large Language Models (LLMs) have revolutionized a wide range of domains such as natural language processing, computer vision, and multi-modal tasks due to their ability to comprehend context and perform logical reasoning. However, the…

Artificial Intelligence · Computer Science 2025-07-31 Haoyang Li , Yiming Li , Anxin Tian , Tianhao Tang , Zhanchao Xu , Xuejia Chen , Nicole Hu , Wei Dong , Qing Li , Lei Chen

Retrieval-Augmented Language Modeling (RALM) by integrating large language models (LLM) with relevant documents from an external corpus is a proven method for enabling the LLM to generate information beyond the scope of its pre-training…

Computation and Language · Computer Science 2025-06-16 Runheng Liu , Xingchen Xiao , Heyan Huang , Zewen Chi , Zhijing Wu

$K$-nearest neighbor language models ($k$NN-LMs), which integrate retrieval with next-word prediction, have demonstrated strong performance in language modeling as well as downstream NLP benchmarks. These results have led researchers to…

Computation and Language · Computer Science 2024-08-22 Shangyi Geng , Wenting Zhao , Alexander M Rush

Large Language Models (LLMs) excel across a variety of language tasks yet are constrained by limited input lengths and high computational costs. Existing approaches\textemdash such as relative positional encodings (e.g., RoPE, ALiBi) and…

Computation and Language · Computer Science 2025-02-18 Kun-Hui Lee , Eunhwan Park , Donghoon Han , Seung-Hoon Na

Language models (LMs) compute the probability of a text by sequentially computing a representation of an already-seen context and using this representation to predict the next word. Currently, most LMs calculate these representations…

Computation and Language · Computer Science 2023-01-18 Frank F. Xu , Uri Alon , Graham Neubig

In this paper, we rethink sparse lexical representations for image retrieval. By utilizing multi-modal large language models (M-LLMs) that support visual prompting, we can extract image features and convert them into textual data, enabling…

Computer Vision and Pattern Recognition · Computer Science 2024-08-30 Kengo Nakata , Daisuke Miyashita , Youyang Ng , Yasuto Hoshi , Jun Deguchi

We know very little about how neural language models (LM) use prior linguistic context. In this paper, we investigate the role of context in an LSTM LM, through ablation studies. Specifically, we analyze the increase in perplexity when…

Computation and Language · Computer Science 2018-05-15 Urvashi Khandelwal , He He , Peng Qi , Dan Jurafsky

Long-context Multimodal Large Language Models (MLLMs) demand substantial computational resources for inference as the growth of their multimodal Key-Value (KV) cache, in response to increasing input lengths, challenges memory and time…

Computation and Language · Computer Science 2024-06-27 Zhongwei Wan , Ziang Wu , Che Liu , Jinfa Huang , Zhihong Zhu , Peng Jin , Longyue Wang , Li Yuan
‹ Prev 1 2 3 10 Next ›