English
Related papers

Related papers: RAC: Relation-Aware Cache Replacement for Large La…

200 papers

Recent advances in Large Language Model (LLM)-based agents have been propelled by Retrieval-Augmented Generation (RAG), which grants the models access to vast external knowledge bases. Despite RAG's success in improving agent performance,…

Computation and Language · Computer Science 2025-11-06 Shuhang Lin , Zhencan Peng , Lingyao Li , Xiao Lin , Xi Zhu , Yongfeng Zhang

Large Language Models (LLMs) are revolutionizing how users interact with information systems, yet their high inference cost poses serious scalability and sustainability challenges. Caching inference responses, allowing them to be retrieved…

Machine Learning · Computer Science 2026-02-16 Xutong Liu , Baran Atalar , Xiangxiang Dai , Jinhang Zuo , Siwei Wang , John C. S. Lui , Wei Chen , Carlee Joe-Wong

Mobile edge Large Language Model (LLM) deployments face inherent constraints, such as limited computational resources and network bandwidth. Although Retrieval-Augmented Generation (RAG) mitigates some challenges by integrating external…

Networking and Internet Architecture · Computer Science 2025-01-17 Guangyuan Liu , Yinqiu Liu , Jiacheng Wang , Hongyang Du , Dusit Niyato , Jiawen Kang , Zehui Xiong

Large Language Models (LLMs) exhibit impressive results across a wide range of natural language processing (NLP) tasks, yet they can often produce factually incorrect outputs. This paper introduces a simple but effective low-latency…

Computation and Language · Computer Science 2024-10-22 Changmao Li , Jeffrey Flanigan

Retrieval-Augmented Generation (RAG) has shown significant improvements in various natural language processing tasks by integrating the strengths of large language models (LLMs) and external knowledge databases. However, RAG introduces long…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-26 Chao Jin , Zili Zhang , Xuanlin Jiang , Fangyue Liu , Xin Liu , Xuanzhe Liu , Xin Jin

Large Language Models (LLMs) have become increasingly popular, transforming a wide range of applications across various domains. However, the real-world effectiveness of their query cache systems has not been thoroughly investigated. In…

Computation and Language · Computer Science 2024-06-04 Jiaxing Li , Chi Xu , Feng Wang , Isaac M von Riedemann , Cong Zhang , Jiangchuan Liu

Large language models (LLMs) have excelled in various applications, yet serving them at scale is challenging due to their substantial resource demands and high latency. Our real-world studies reveal that over 70% of user requests to LLMs…

Machine Learning · Computer Science 2025-09-05 Yifan Yu , Yu Gan , Nikhil Sarda , Lillian Tsai , Jiaming Shen , Yanqi Zhou , Arvind Krishnamurthy , Fan Lai , Henry M. Levy , David Culler

Large Language Models (LLMs) have demonstrated remarkable capabilities in leveraging extensive external knowledge to enhance responses in multi-turn and agentic applications, such as retrieval-augmented generation (RAG). However, processing…

Computation and Language · Computer Science 2025-10-14 Xiaoqiang Lin , Aritra Ghosh , Bryan Kian Hsiang Low , Anshumali Shrivastava , Vijai Mohan

Recent advances in Large Language Models (LLMs) have revolutionized web applications, enabling intelligent search, recommendation, and assistant services with natural language interfaces. Tool-calling extends LLMs with the ability to…

Software Engineering · Computer Science 2026-01-23 Yi Zhai , Dian Shen , Junzhou Luo , Bin Yang

Retrieval Augmented Generation (RAG) has emerged as a widely adopted approach to mitigate the limitations of large language models (LLMs) in answering domain-specific questions. Previous research has predominantly focused on improving the…

Machine Learning · Computer Science 2025-01-07 Mohammad Hassan Heydari , Arshia Hemmat , Erfan Naman , Afsaneh Fatemi

Semantic caching significantly reduces computational costs and improves efficiency by storing and reusing large language model (LLM) responses. However, existing systems rely primarily on matching individual queries, lacking awareness of…

Computation and Language · Computer Science 2025-07-16 Jianxin Yan , Wangze Ni , Lei Chen , Xuemin Lin , Peng Cheng , Zhan Qin , Kui Ren

Retrieval-Augmented Generation (RAG) systems critically depend on effective document chunking strategies to balance retrieval quality, latency, and operational cost. Traditional chunking approaches, such as fixed-size, rule-based, or fully…

Information Retrieval · Computer Science 2026-04-08 Uday Allu , Sonu Kedia , Tanmay Odapally , Biddwan Ahmed

Memory caches are being aggressively used in today's data-parallel systems such as Spark, Tez, and Piccolo. However, prevalent systems employ rather simple cache management policies--notably the Least Recently Used (LRU) policy--that are…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-03-27 Yinghao Yu , Wei Wang , Jun Zhang , Khaled Ben Letaief

Effectively incorporating external knowledge into Large Language Models (LLMs) is crucial for enhancing their capabilities and addressing real-world needs. Retrieval-Augmented Generation (RAG) offers an effective method for achieving this…

Computation and Language · Computer Science 2025-03-06 Kuan Li , Liwen Zhang , Yong Jiang , Pengjun Xie , Fei Huang , Shuai Wang , Minhao Cheng

Large language models (LLMs) rely on key-value cache (KV cache) to accelerate decoding by reducing redundant computations. However, the KV cache memory usage grows substantially with longer text sequences, posing challenges for efficient…

Computation and Language · Computer Science 2025-11-18 Yixuan Wang , Shiyu Ji , Yijun Liu , Yuzhuang Xu , Yang Xu , Qingfu Zhu , Wanxiang Che

Providing external knowledge to Large Language Models (LLMs) is a key point for using these models in real-world applications for several reasons, such as incorporating up-to-date content in a real-time manner, providing access to…

Computation and Language · Computer Science 2024-06-04 Simon Akesson , Frances A. Santos

Large Language Models (LLMs) have been integrated into recommendation systems to enhance user behavior comprehension. The Retrieval Augmented Generation (RAG) technique is further incorporated into these systems to retrieve more relevant…

Information Retrieval · Computer Science 2025-02-12 Jian Xu , Sichun Luo , Xiangyu Chen , Haoming Huang , Hanxu Hou , Linqi Song

Large language models (LLMs) are being widely researched across various disciplines, with significant recent efforts focusing on adapting LLMs for understanding of how communication networks operate. However, over-reliance on prompting…

Computation and Language · Computer Science 2024-10-22 Liujianfu Wang , Yuyang Du , Jingqi Lin , Kexin Chen , Soung Chang Liew

Owing to recent advancements, Large Language Models (LLMs) can now be deployed as agents for increasingly complex decision-making applications in areas including robotics, gaming, and API integration. However, reflecting past experiences in…

Considering the limited internal parametric knowledge, retrieval-augmented generation (RAG) has been widely used to extend the knowledge scope of large language models (LLMs). Despite the extensive efforts on RAG research, in existing…

Computation and Language · Computer Science 2024-11-22 Yuhao Wang , Ruiyang Ren , Junyi Li , Wayne Xin Zhao , Jing Liu , Ji-Rong Wen
‹ Prev 1 2 3 10 Next ›