Related papers: 4bit-Quantization in Vector-Embedding for RAG

Progressive Searching for Retrieval in RAG

Retrieval Augmented Generation (RAG) is a promising technique for mitigating two key limitations of large language models (LLMs): outdated information and hallucinations. RAG system stores documents as embedding vectors in a database. Given…

Information Retrieval · Computer Science 2026-02-10 Taehee Jeong , Xingzhe Zhao , Peizu Li , Markus Valvur , Weihua Zhao

A Survey on Retrieval-Augmented Text Generation for Large Language Models

Retrieval-Augmented Generation (RAG) merges retrieval methods with deep learning advancements to address the static limitations of large language models (LLMs) by enabling the dynamic integration of up-to-date external information. This…

Information Retrieval · Computer Science 2026-05-19 Yizheng Huang , Jimmy Huang

Benchmarking Large Language Models in Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) is a promising approach for mitigating the hallucination of large language models (LLMs). However, existing research lacks rigorous evaluation of the impact of retrieval-augmented generation on different…

Computation and Language · Computer Science 2023-12-21 Jiawei Chen , Hongyu Lin , Xianpei Han , Le Sun

When Retrieval Succeeds and Fails: Rethinking Retrieval-Augmented Generation for LLMs

Large Language Models (LLMs) have enabled a wide range of applications through their powerful capabilities in language understanding and generation. However, as LLMs are trained on static corpora, they face difficulties in addressing…

Computation and Language · Computer Science 2025-10-13 Yongjie Wang , Yue Yu , Kaisong Song , Jun Lin , Zhiqi Shen

Parametric Retrieval Augmented Generation

Retrieval-augmented generation (RAG) techniques have emerged as a promising solution to enhance the reliability of large language models (LLMs) by addressing issues like hallucinations, outdated knowledge, and domain adaptation. In…

Computation and Language · Computer Science 2025-01-28 Weihang Su , Yichen Tang , Qingyao Ai , Junxi Yan , Changyue Wang , Hongning Wang , Ziyi Ye , Yujia Zhou , Yiqun Liu

VectorLiteRAG: Latency-Aware and Fine-Grained Resource Partitioning for Efficient RAG

Retrieval-Augmented Generation (RAG) systems combine vector similarity search with large language models (LLMs) to deliver accurate, context-aware responses. However, co-locating the vector retriever and the LLM on shared GPU infrastructure…

Machine Learning · Computer Science 2026-01-21 Junkyum Kim , Divya Mahajan

RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation

Large Language Models (LLMs) exhibit remarkable capabilities but are prone to generating inaccurate or hallucinatory responses. This limitation stems from their reliance on vast pretraining datasets, making them susceptible to errors in…

Computation and Language · Computer Science 2024-04-02 Chi-Min Chan , Chunpu Xu , Ruibin Yuan , Hongyin Luo , Wei Xue , Yike Guo , Jie Fu

Multi-Head RAG: Solving Multi-Aspect Problems with LLMs

Retrieval-Augmented Generation (RAG) improves Large Language Models (LLMs) by retrieving supporting documents into the prompt, but existing methods do not explicitly target queries that require fetching multiple documents with substantially…

Computation and Language · Computer Science 2026-02-26 Maciej Besta , Ales Kubicek , Robert Gerstenberger , Marcin Chrapek , Roman Niggli , Patrik Okanovic , Yi Zhu , Patrick Iff , Michal Podstawski , Lucas Weitzendorf , Mingyuan Chi , Joanna Gajda , Piotr Nyczyk , Jürgen Müller , Hubert Niewiadomski , Torsten Hoefler

Development and Testing of Retrieval Augmented Generation in Large Language Models -- A Case Study Report

Purpose: Large Language Models (LLMs) hold significant promise for medical applications. Retrieval Augmented Generation (RAG) emerges as a promising approach for customizing domain knowledge in LLMs. This case study presents the development…

Computation and Language · Computer Science 2024-02-06 YuHe Ke , Liyuan Jin , Kabilan Elangovan , Hairil Rizal Abdullah , Nan Liu , Alex Tiong Heng Sia , Chai Rick Soh , Joshua Yi Min Tung , Jasmine Chiat Ling Ong , Daniel Shu Wei Ting

Retrieval-Augmented Generation for Large Language Models: A Survey

Large Language Models (LLMs) showcase impressive capabilities but encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a…

Computation and Language · Computer Science 2024-03-28 Yunfan Gao , Yun Xiong , Xinyu Gao , Kangxiang Jia , Jinliu Pan , Yuxi Bi , Yi Dai , Jiawei Sun , Meng Wang , Haofen Wang

Optimization of embeddings storage for RAG systems using quantization and dimensionality reduction techniques

Retrieval-Augmented Generation enhances language models by retrieving relevant information from external knowledge bases, relying on high-dimensional vector embeddings typically stored in float32 precision. However, storing these embeddings…

Information Retrieval · Computer Science 2025-05-02 Naamán Huerga-Pérez , Rubén Álvarez , Rubén Ferrero-Guillén , Alberto Martínez-Gutiérrez , Javier Díez-González

Invar-RAG: Invariant LLM-aligned Retrieval for Better Generation

Retrieval-augmented generation (RAG) has shown impressive capability in providing reliable answer predictions and addressing hallucination problems. A typical RAG implementation uses powerful retrieval models to extract external information…

Information Retrieval · Computer Science 2024-11-19 Ziwei Liu , Liang Zhang , Qian Li , Jianghua Wu , Guangxu Zhu

Re-ranking the Context for Multimodal Retrieval Augmented Generation

Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external knowledge to generate a response within a context with improved accuracy and reduced hallucinations. However, multi-modal RAG systems face…

Machine Learning · Computer Science 2025-01-09 Matin Mortaheb , Mohammad A. Amir Khojastepour , Srimat T. Chakradhar , Sennur Ulukus

ELITE: Embedding-Less retrieval with Iterative Text Exploration

Large Language Models (LLMs) have achieved impressive progress in natural language processing, but their limited ability to retain long-term context constrains performance on document-level or multi-turn tasks. Retrieval-Augmented…

Computation and Language · Computer Science 2025-05-20 Zhangyu Wang , Siyuan Gao , Rong Zhou , Hao Wang , Li Ning

Contextual Compression in Retrieval-Augmented Generation for Large Language Models: A Survey

Large Language Models (LLMs) showcase remarkable abilities, yet they struggle with limitations such as hallucinations, outdated knowledge, opacity, and inexplicable reasoning. To address these challenges, Retrieval-Augmented Generation…

Computation and Language · Computer Science 2024-10-03 Sourav Verma

Retrieval Augmented Generation and Representative Vector Summarization for large unstructured textual data in Medical Education

Large Language Models are increasingly being used for various tasks including content generation and as chatbots. Despite their impressive performances in general tasks, LLMs need to be aligned when applying for domain specific tasks to…

Computation and Language · Computer Science 2023-08-02 S. S. Manathunga , Y. A. Illangasekara

Corrective Retrieval Augmented Generation

Large language models (LLMs) inevitably exhibit hallucinations since the accuracy of generated texts cannot be secured solely by the parametric knowledge they encapsulate. Although retrieval-augmented generation (RAG) is a practicable…

Computation and Language · Computer Science 2024-10-08 Shi-Qi Yan , Jia-Chen Gu , Yun Zhu , Zhen-Hua Ling

One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models

Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs) for generating more factual, accurate, and up-to-date content. Existing methods either optimize prompts to guide LLMs in leveraging retrieved…

Computation and Language · Computer Science 2024-12-12 Yutao Zhu , Zhaoheng Huang , Zhicheng Dou , Ji-Rong Wen

GEM-RAG: Graphical Eigen Memories For Retrieval Augmented Generation

The ability to form, retrieve, and reason about memories in response to stimuli serves as the cornerstone for general intelligence - shaping entities capable of learning, adaptation, and intuitive insight. Large Language Models (LLMs) have…

Computation and Language · Computer Science 2024-09-25 Brendan Hogan Rappazzo , Yingheng Wang , Aaron Ferber , Carla Gomes

Enhancing LLM Intelligence with ARM-RAG: Auxiliary Rationale Memory for Retrieval Augmented Generation

Large Language Models (LLMs) are smart but forgetful. Recent studies, (e.g., (Bubeck et al., 2023)) on modern LLMs have shown that they are capable of performing amazing tasks typically necessitating human-level intelligence. However,…

Computation and Language · Computer Science 2023-11-08 Eric Melz