English

LEMUR: Learned Multi-Vector Retrieval

Information Retrieval 2026-05-22 v2 Machine Learning

Abstract

Multi-vector representations generated by late interaction models, such as ColBERT, enable superior retrieval quality compared to single-vector representations in information retrieval applications. In multi-vector retrieval systems, both queries and documents are encoded using one embedding per token, and similarity between queries and documents is measured by the MaxSim similarity measure. However, the improved quality of multi-vector retrieval comes at the expense of significantly increased search latency. In this work, we introduce LEMUR, a simple yet efficient framework for multi-vector similarity search. LEMUR consists of two consecutive problem reductions: First, we formulate multi-vector similarity search as a supervised learning problem that can be solved using a one-hidden-layer neural network. Second, we reduce inference under this model to single-vector similarity search in its latent space, enabling the use of existing single-vector search indexes to accelerate retrieval. LEMUR is an order of magnitude faster than prior multi-vector similarity search methods. Our code is available at https://github.com/ejaasaari/lemur

Keywords

Cite

@article{arxiv.2601.21853,
  title  = {LEMUR: Learned Multi-Vector Retrieval},
  author = {Elias Jääsaari and Ville Hyvönen and Teemu Roos},
  journal= {arXiv preprint arXiv:2601.21853},
  year   = {2026}
}

Comments

Accepted to ICML 2026

R2 v1 2026-07-01T09:25:55.033Z