Related papers: REFRAG: Rethinking RAG based Decoding

Long Context RAG Performance of Large Language Models

Retrieval Augmented Generation (RAG) has emerged as a crucial technique for enhancing the accuracy of Large Language Models (LLMs) by incorporating external information. With the advent of LLMs that support increasingly longer context…

Machine Learning · Computer Science 2024-11-07 Quinn Leng , Jacob Portes , Sam Havens , Matei Zaharia , Michael Carbin

Does RAG Really Perform Bad For Long-Context Processing?

The efficient processing of long context poses a serious challenge for large language models (LLMs). Recently, retrieval-augmented generation (RAG) has emerged as a promising strategy for this problem, as it enables LLMs to make selective…

Computation and Language · Computer Science 2025-02-18 Kun Luo , Zheng Liu , Peitian Zhang , Hongjin Qian , Jun Zhao , Kang Liu

Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG

Retrieval-augmented generation (RAG) empowers large language models (LLMs) to utilize external knowledge sources. The increasing capacity of LLMs to process longer input sequences opens up avenues for providing more retrieved information,…

Computation and Language · Computer Science 2024-10-10 Bowen Jin , Jinsung Yoon , Jiawei Han , Sercan O. Arik

Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection

Large language models (LLMs) augmented with retrieval exhibit robust performance and extensive versatility by incorporating external contexts. However, the input length grows linearly in the number of retrieved documents, causing a dramatic…

Computation and Language · Computer Science 2024-05-28 Yun Zhu , Jia-Chen Gu , Caitlin Sikora , Ho Ko , Yinxiao Liu , Chu-Cheng Lin , Lei Shu , Liangchen Luo , Lei Meng , Bang Liu , Jindong Chen

RAPID: Long-Context Inference with Retrieval-Augmented Speculative Decoding

The emergence of long-context large language models (LLMs) offers a promising alternative to traditional retrieval-augmented generation (RAG) for processing extensive documents. However, the computational overhead of long-context inference…

Computation and Language · Computer Science 2025-06-24 Guanzheng Chen , Qilong Feng , Jinjie Ni , Xin Li , Michael Qizhe Shieh

MemoRAG: Boosting Long Context Processing with Global Memory-Enhanced Retrieval Augmentation

Processing long contexts presents a significant challenge for large language models (LLMs). While recent advancements allow LLMs to handle much longer contexts than before (e.g., 32K or 128K tokens), it is computationally expensive and can…

Computation and Language · Computer Science 2025-04-10 Hongjin Qian , Zheng Liu , Peitian Zhang , Kelong Mao , Defu Lian , Zhicheng Dou , Tiejun Huang

On the Influence of Context Size and Model Choice in Retrieval-Augmented Generation Systems

Retrieval-augmented generation (RAG) has emerged as an approach to augment large language models (LLMs) by reducing their reliance on static knowledge and improving answer factuality. RAG retrieves relevant context snippets and generates an…

Computation and Language · Computer Science 2025-02-21 Juraj Vladika , Florian Matthes

Lighter And Better: Towards Flexible Context Adaptation For Retrieval Augmented Generation

The existing Retrieval-Augmented Generation (RAG) systems face significant challenges in terms of cost and effectiveness. On one hand, they need to encode the lengthy retrieved contexts before responding to the input tasks, which imposes…

Computation and Language · Computer Science 2024-09-25 Zheng Liu , Chenyuan Wu , Ninglu Shao , Shitao Xiao , Chaozhuo Li , Defu Lian

Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models

Retrieval-Augmented Generation (RAG) has been shown to enhance the factual accuracy of Large Language Models (LLMs), but existing methods often suffer from limited reasoning capabilities in effectively using the retrieved evidence,…

Computation and Language · Computer Science 2024-10-03 Shayekh Bin Islam , Md Asib Rahman , K S M Tozammel Hossain , Enamul Hoque , Shafiq Joty , Md Rizwan Parvez

In Defense of RAG in the Era of Long-Context Language Models

Overcoming the limited context limitations in early-generation LLMs, retrieval-augmented generation (RAG) has been a reliable solution for context-based answer generation in the past. Recently, the emergence of long-context LLMs allows the…

Computation and Language · Computer Science 2024-09-04 Tan Yu , Anbang Xu , Rama Akkiraju

Contextual Compression in Retrieval-Augmented Generation for Large Language Models: A Survey

Large Language Models (LLMs) showcase remarkable abilities, yet they struggle with limitations such as hallucinations, outdated knowledge, opacity, and inexplicable reasoning. To address these challenges, Retrieval-Augmented Generation…

Computation and Language · Computer Science 2024-10-03 Sourav Verma

Grounding Long-Context Reasoning with Contextual Normalization for Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) has become an essential approach for extending the reasoning and knowledge capacity of large language models (LLMs). While prior research has primarily focused on retrieval quality and prompting…

Computation and Language · Computer Science 2025-12-09 Jiamin Chen , Yuchen Li , Xinyu Ma , Xinran Chen , Xiaokun Zhang , Shuaiqiang Wang , Chen Ma , Dawei Yin

Inference Scaling for Long-Context Retrieval Augmented Generation

The scaling of inference computation has unlocked the potential of long-context large language models (LLMs) across diverse settings. For knowledge-intensive tasks, the increased compute is often allocated to incorporate more external…

Computation and Language · Computer Science 2025-03-04 Zhenrui Yue , Honglei Zhuang , Aijun Bai , Kai Hui , Rolf Jagerman , Hansi Zeng , Zhen Qin , Dong Wang , Xuanhui Wang , Michael Bendersky

LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs -- No Silver Bullet for LC or RAG Routing

Effectively incorporating external knowledge into Large Language Models (LLMs) is crucial for enhancing their capabilities and addressing real-world needs. Retrieval-Augmented Generation (RAG) offers an effective method for achieving this…

Computation and Language · Computer Science 2025-03-06 Kuan Li , Liwen Zhang , Yong Jiang , Pengjun Xie , Fei Huang , Shuai Wang , Minhao Cheng

On the Consistency of Multilingual Context Utilization in Retrieval-Augmented Generation

Retrieval-augmented generation (RAG) with large language models (LLMs) has demonstrated strong performance in multilingual question-answering (QA) tasks by leveraging relevant passages retrieved from corpora. In multilingual RAG (mRAG), the…

Computation and Language · Computer Science 2025-12-12 Jirui Qi , Raquel Fernández , Arianna Bisazza

Long Context vs. RAG for LLMs: An Evaluation and Revisits

Extending context windows (i.e., Long Context, LC) and using retrievers to selectively access relevant information (i.e., Retrieval-Augmented Generation, RAG) are the two main strategies to enable LLMs to incorporate extremely long external…

Computation and Language · Computer Science 2025-01-06 Xinze Li , Yixin Cao , Yubo Ma , Aixin Sun

RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation

Large Language Models (LLMs) exhibit remarkable capabilities but are prone to generating inaccurate or hallucinatory responses. This limitation stems from their reliance on vast pretraining datasets, making them susceptible to errors in…

Computation and Language · Computer Science 2024-04-02 Chi-Min Chan , Chunpu Xu , Ruibin Yuan , Hongyin Luo , Wei Xue , Yike Guo , Jie Fu

Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach

Retrieval Augmented Generation (RAG) has been a powerful tool for Large Language Models (LLMs) to efficiently process overly lengthy contexts. However, recent LLMs like Gemini-1.5 and GPT-4 show exceptional capabilities to understand long…

Computation and Language · Computer Science 2024-10-18 Zhuowan Li , Cheng Li , Mingyang Zhang , Qiaozhu Mei , Michael Bendersky

CONFLARE: CONFormal LArge language model REtrieval

Retrieval-augmented generation (RAG) frameworks enable large language models (LLMs) to retrieve relevant information from a knowledge base and incorporate it into the context for generating responses. This mitigates hallucinations and…

Computation and Language · Computer Science 2024-04-09 Pouria Rouzrokh , Shahriar Faghani , Cooper U. Gamble , Moein Shariatnia , Bradley J. Erickson

M-RAG: Making RAG Faster, Stronger, and More Efficient

Retrieval-Augmented Generation (RAG) has become a widely adopted paradigm for enhancing the reliability of large language models (LLMs). However, RAG systems are sensitive to retrieval strategies that rely on text chunking to construct…

Information Retrieval · Computer Science 2026-03-31 Sun Xu , Tongkai Xu , Baiheng Xie , Li Huang , Qiang Gao , Kunpeng Zhang