Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling

Sebastian Hofstätter; Jiecao Chen; Karthik Raman; Hamed Zamani

Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling

Computation and Language 2022-07-08 v1 Information Retrieval

Authors: Sebastian Hofstätter , Jiecao Chen , Karthik Raman , Hamed Zamani

Abstract

This paper studies multi-task training of retrieval-augmented generation models for knowledge-intensive tasks. We propose to clean the training set by utilizing a distinct property of knowledge-intensive generation: The connection of query-answer pairs to items in the knowledge base. We filter training examples via a threshold of confidence on the relevance labels, whether a pair is answerable by the knowledge base or not. We train a single Fusion-in-Decoder (FiD) generator on seven combined tasks of the KILT benchmark. The experimental results suggest that our simple yet effective approach substantially improves competitive baselines on two strongly imbalanced tasks; and shows either smaller improvements or no significant regression on the remaining tasks. Furthermore, we demonstrate our multi-task training with relevance label sampling scales well with increased model capacity and achieves state-of-the-art results in five out of seven KILT tasks.

Keywords

text generation multi-task learning information retrieval

Cite

@article{arxiv.2207.03030,
  title  = {Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling},
  author = {Sebastian Hofstätter and Jiecao Chen and Karthik Raman and Hamed Zamani},
  journal= {arXiv preprint arXiv:2207.03030},
  year   = {2022}
}

Comments

Accepted at the ICML 2022 Workshop on Knowledge Retrieval and Language Models (KRLM)

Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling

Abstract

Keywords

Cite

Comments

Related papers