English

CODER: An efficient framework for improving retrieval through COntextual Document Embedding Reranking

Information Retrieval 2023-10-24 v3 Artificial Intelligence

Abstract

Contrastive learning has been the dominant approach to training dense retrieval models. In this work, we investigate the impact of ranking context - an often overlooked aspect of learning dense retrieval models. In particular, we examine the effect of its constituent parts: jointly scoring a large number of negatives per query, using retrieved (query-specific) instead of random negatives, and a fully list-wise loss. To incorporate these factors into training, we introduce Contextual Document Embedding Reranking (CODER), a highly efficient retrieval framework. When reranking, it incurs only a negligible computational overhead on top of a first-stage method at run time (delay per query in the order of milliseconds), allowing it to be easily combined with any state-of-the-art dual encoder method. After fine-tuning through CODER, which is a lightweight and fast process, models can also be used as stand-alone retrievers. Evaluating CODER in a large set of experiments on the MS~MARCO and TripClick collections, we show that the contextual reranking of precomputed document embeddings leads to a significant improvement in retrieval performance. This improvement becomes even more pronounced when more relevance information per query is available, shown in the TripClick collection, where we establish new state-of-the-art results by a large margin.

Keywords

Cite

@article{arxiv.2112.08766,
  title  = {CODER: An efficient framework for improving retrieval through COntextual Document Embedding Reranking},
  author = {George Zerveas and Navid Rekabsaz and Daniel Cohen and Carsten Eickhoff},
  journal= {arXiv preprint arXiv:2112.08766},
  year   = {2023}
}

Comments

EMNLP 2022

R2 v1 2026-06-24T08:20:04.935Z