Related papers: A Statistical Framework for Data-dependent Retriev…

End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering

We present an end-to-end differentiable training method for retrieval-augmented open-domain question answering systems that combine information from multiple retrieved documents when generating answers. We model retrieval decisions as…

Computation and Language · Computer Science 2021-12-07 Devendra Singh Sachan , Siva Reddy , William Hamilton , Chris Dyer , Dani Yogatama

Retrieval-Enhanced Machine Learning: Synthesis and Opportunities

In the field of language modeling, models augmented with retrieval components have emerged as a promising solution to address several challenges faced in the natural language processing (NLP) field, including knowledge grounding,…

Machine Learning · Computer Science 2024-10-22 To Eun Kim , Alireza Salemi , Andrew Drozdov , Fernando Diaz , Hamed Zamani

LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding

Recently embedding-based retrieval or dense retrieval have shown state of the art results, compared with traditional sparse or bag-of-words based approaches. This paper introduces a model-agnostic doc-level embedding framework through large…

Information Retrieval · Computer Science 2024-04-10 Mingrui Wu , Sheng Cao

Retrieval Augmentation for Deep Neural Networks

Deep neural networks have achieved state-of-the-art results in various vision and/or language tasks. Despite the use of large training datasets, most models are trained by iterating over single input-output pairs, discarding the remaining…

Computation and Language · Computer Science 2021-04-27 Rita Parada Ramos , Patrícia Pereira , Helena Moniz , Joao Paulo Carvalho , Bruno Martins

Improving Retrieval-Augmented Large Language Models via Data Importance Learning

Retrieval augmentation enables large language models to take advantage of external knowledge, for example on tasks like question answering and data imputation. However, the performance of such retrieval-augmented models is limited by the…

Machine Learning · Computer Science 2023-07-07 Xiaozhong Lyu , Stefan Grafberger , Samantha Biegel , Shaopeng Wei , Meng Cao , Sebastian Schelter , Ce Zhang

Evaluating the Effectiveness and Scalability of LLM-Based Data Augmentation for Retrieval

Compact dual-encoder models are widely used for retrieval owing to their efficiency and scalability. However, such models often underperform compared to their Large Language Model (LLM)-based retrieval counterparts, likely due to their…

Information Retrieval · Computer Science 2025-09-23 Pranjal A. Chitale , Bishal Santra , Yashoteja Prabhu , Amit Sharma

Reliable, Adaptable, and Attributable Language Models with Retrieval

Parametric language models (LMs), which are trained on vast amounts of web data, exhibit remarkable flexibility and capability. However, they still face practical challenges such as hallucinations, difficulty in adapting to new data…

Computation and Language · Computer Science 2024-03-06 Akari Asai , Zexuan Zhong , Danqi Chen , Pang Wei Koh , Luke Zettlemoyer , Hannaneh Hajishirzi , Wen-tau Yih

More Room for Language: Investigating the Effect of Retrieval on Language Models

Retrieval-augmented language models pose a promising alternative to standard language modeling. During pretraining, these models search in a corpus of documents for contextually relevant information that could aid the language modeling…

Computation and Language · Computer Science 2024-04-18 David Samuel , Lucas Georges Gabriel Charpentier , Sondre Wold

Making Retrieval-Augmented Language Models Robust to Irrelevant Context

Retrieval-augmented language models (RALMs) hold promise to produce language understanding systems that are are factual, efficient, and up-to-date. An important desideratum of RALMs, is that retrieved information helps model performance…

Computation and Language · Computer Science 2024-05-07 Ori Yoran , Tomer Wolfson , Ori Ram , Jonathan Berant

A Survey on Data Augmentation in Large Model Era

Large models, encompassing large language and diffusion models, have shown exceptional promise in approximating human-level intelligence, garnering significant interest from both academic and industrial spheres. However, the training of…

Machine Learning · Computer Science 2024-03-05 Yue Zhou , Chenlu Guo , Xu Wang , Yi Chang , Yuan Wu

Data Augmentation for Sample Efficient and Robust Document Ranking

Contextual ranking models have delivered impressive performance improvements over classical models in the document ranking task. However, these highly over-parameterized models tend to be data-hungry and require large amounts of data even…

Information Retrieval · Computer Science 2023-11-28 Abhijit Anand , Jurek Leonhardt , Jaspreet Singh , Koustav Rudra , Avishek Anand

Redefining Information Retrieval of Structured Database via Large Language Models

Retrieval augmentation is critical when Language Models (LMs) exploit non-parametric knowledge related to the query through external knowledge bases before reasoning. The retrieved information is incorporated into LMs as context alongside…

Information Retrieval · Computer Science 2024-11-21 Mingzhu Wang , Yuzhe Zhang , Qihang Zhao , Junyi Yang , Hong Zhang

Text Data Augmentation for Large Language Models: A Comprehensive Survey of Methods, Challenges, and Opportunities

The increasing size and complexity of pre-trained language models have demonstrated superior performance in many applications, but they usually require large training datasets to be adequately trained. Insufficient training sets could…

Computation and Language · Computer Science 2025-02-03 Yaping Chai , Haoran Xie , Joe S. Qin

Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering

Retrieval augmented language models have recently become the standard for knowledge intensive tasks. Rather than relying purely on latent semantics within the parameters of large neural models, these methods enlist a semi-parametric memory…

Computation and Language · Computer Science 2023-01-24 Wenhu Chen , Pat Verga , Michiel de Jong , John Wieting , William Cohen

Harnessing the Power of Reinforcement Learning for Language-Model-Based Information Retriever via Query-Document Co-Augmentation

Recent studies have proposed leveraging Large Language Models (LLMs) as information retrievers through query rewriting. However, for challenging corpora, we argue that enhancing queries alone is insufficient for robust semantic matching;…

Information Retrieval · Computer Science 2025-06-24 Jingming Liu , Yumeng Li , Wei Shi , Yao-Xiang Ding , Hui Su , Kun Zhou

Cost-Aware Retrieval-Augmentation Reasoning Models with Adaptive Retrieval Depth

Reasoning models have gained significant attention due to their strong performance, particularly when enhanced with retrieval augmentation. However, these models often incur high computational costs, as both retrieval and reasoning tokens…

Computation and Language · Computer Science 2025-10-20 Helia Hashemi , Victor Rühle , Saravan Rajmohan

Retrieval-Based Transformer for Table Augmentation

Data preparation, also called data wrangling, is considered one of the most expensive and time-consuming steps when performing analytics or building machine learning models. Preparing data typically involves collecting and merging data from…

Computation and Language · Computer Science 2023-06-22 Michael Glass , Xueqing Wu , Ankita Rajaram Naik , Gaetano Rossiello , Alfio Gliozzo

Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity

Retrieval-Augmented Large Language Models (LLMs), which incorporate the non-parametric knowledge from external knowledge bases into LLMs, have emerged as a promising approach to enhancing response accuracy in several tasks, such as…

Computation and Language · Computer Science 2024-03-29 Soyeong Jeong , Jinheon Baek , Sukmin Cho , Sung Ju Hwang , Jong C. Park

Retrieval-Enhanced Machine Learning

Although information access systems have long supported people in accomplishing a wide range of tasks, we propose broadening the scope of users of information access systems to include task-driven machines, such as machine learning models.…

Machine Learning · Computer Science 2022-05-04 Hamed Zamani , Fernando Diaz , Mostafa Dehghani , Donald Metzler , Michael Bendersky

A Multi-Task Embedder For Retrieval Augmented LLMs

LLMs confront inherent limitations in terms of its knowledge, memory, and action. The retrieval augmentation stands as a vital mechanism to address these limitations, which brings in useful information from external sources to augment the…

Information Retrieval · Computer Science 2026-01-06 Peitian Zhang , Shitao Xiao , Zheng Liu , Zhicheng Dou , Jian-Yun Nie