Related papers: LEMUR: Learned Multi-Vector Retrieval

LIR: The First Workshop on Late Interaction and Multi Vector Retrieval @ ECIR 2026

Late interaction retrieval methods, pioneered by ColBERT, have emerged as a powerful alternative to single-vector neural IR. By leveraging fine-grained, token-level representations, they have been demonstrated to deliver strong…

Information Retrieval · Computer Science 2025-11-04 Benjamin Clavié , Xianming Li , Antoine Chaffin , Omar Khattab , Tom Aarsen , Manuel Faysse , Jing Li

MuMUR : Multilingual Multimodal Universal Retrieval

Multi-modal retrieval has seen tremendous progress with the development of vision-language models. However, further improving these models require additional labelled data which is a huge manual effort. In this paper, we propose a framework…

Computer Vision and Pattern Recognition · Computer Science 2023-09-26 Avinash Madasu , Estelle Aflalo , Gabriela Ben Melech Stan , Shachar Rosenman , Shao-Yen Tseng , Gedas Bertasius , Vasudev Lal

Investigating Multi-layer Representations for Dense Passage Retrieval

Dense retrieval models usually adopt vectors from the last hidden layer of the document encoder to represent a document, which is in contrast to the fact that representations in different layers of a pre-trained language model usually…

Information Retrieval · Computer Science 2025-09-30 Zhongbin Xie , Thomas Lukasiewicz

Incorporating Token Importance in Multi-Vector Retrieval

ColBERT introduced a late interaction mechanism that independently encodes queries and documents using BERT, and computes similarity via fine-grained interactions over token-level vector representations. This design enables expressive…

Information Retrieval · Computer Science 2025-11-21 Archish S , Ankit Garg , Kirankumar Shiragur , Neeraj Kayal

MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings

Neural embedding models have become a fundamental component of modern information retrieval (IR) pipelines. These models produce a single embedding $x \in \mathbb{R}^d$ per data-point, allowing for fast retrieval via highly optimized…

Data Structures and Algorithms · Computer Science 2024-05-31 Laxman Dhulipala , Majid Hadian , Rajesh Jayaram , Jason Lee , Vahab Mirrokni

LEMUR: A Corpus for Robust Fine-Tuning of Multilingual Law Embedding Models for Retrieval

Large language models (LLMs) are increasingly used to access legal information. Yet, their deployment in multilingual legal settings is constrained by unreliable retrieval and the lack of domain-adapted, open-embedding models. In…

Computation and Language · Computer Science 2026-02-11 Narges Baba Ahmadi , Jan Strich , Martin Semmann , Chris Biemann

Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval

Cross-modal retrieval is gaining increasing efficacy and interest from the research community, thanks to large-scale training, novel architectural and learning designs, and its application in LLMs and multimodal LLMs. In this paper, we move…

Computer Vision and Pattern Recognition · Computer Science 2025-03-05 Davide Caffagni , Sara Sarto , Marcella Cornia , Lorenzo Baraldi , Rita Cucchiara

SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes

This paper introduces Sparsified Late Interaction for Multi-vector (SLIM) retrieval with inverted indexes. Multi-vector retrieval methods have demonstrated their effectiveness on various retrieval datasets, and among them, ColBERT is the…

Information Retrieval · Computer Science 2023-05-10 Minghan Li , Sheng-Chieh Lin , Xueguang Ma , Jimmy Lin

ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT

Recent progress in Natural Language Understanding (NLU) is driving fast-paced advances in Information Retrieval (IR), largely owed to fine-tuning deep language models (LMs) for document ranking. While remarkably effective, the ranking…

Information Retrieval · Computer Science 2020-06-05 Omar Khattab , Matei Zaharia

LEMUR: Large scale End-to-end MUltimodal Recommendation

Traditional ID-based recommender systems often struggle with cold-start and generalization challenges. Multimodal recommendation systems, which leverage textual and visual data, offer a promising solution to mitigate these issues. However,…

Information Retrieval · Computer Science 2025-11-18 Xintian Han , Honggang Chen , Quan Lin , Jingyue Gao , Xiangyuan Ren , Lifei Zhu , Zhisheng Ye , Shikang Wu , XiongHang Xie , Xiaochu Gan , Bingzheng Wei , Peng Xu , Zhe Wang , Yuchao Zheng , Jingjian Lin , Di Wu , Junfeng Ge

LLM-assisted Vector Similarity Search

As data retrieval demands become increasingly complex, traditional search methods often fall short in addressing nuanced and conceptual queries. Vector similarity search has emerged as a promising technique for finding semantically similar…

Artificial Intelligence · Computer Science 2024-12-31 Md Riyadh , Muqi Li , Felix Haryanto Lie , Jia Long Loh , Haotian Mi , Sayam Bohra

Beyond Single Embeddings: Capturing Diverse Targets with Multi-Query Retrieval

Most text retrievers generate \emph{one} query vector to retrieve relevant documents. Yet, the conditional distribution of relevant documents for the query may be multimodal, e.g., representing different interpretations of the query. We…

Computation and Language · Computer Science 2025-11-05 Hung-Ting Chen , Xiang Liu , Shauli Ravfogel , Eunsol Choi

GEM: A Native Graph-based Index for Multi-Vector Retrieval

In multi-vector retrieval, both queries and data are represented as sets of high-dimensional vectors, enabling finer-grained semantic matching and improving retrieval quality over single-vector approaches. However, its practical adoption is…

Information Retrieval · Computer Science 2026-03-24 Yao Tian , Zhoujin Tian , Xi Zhao , Ruiyuan Zhang , Xiaofang Zhou

VectorSearch: Enhancing Document Retrieval with Semantic Embeddings and Optimized Search

Traditional retrieval methods have been essential for assessing document similarity but struggle with capturing semantic nuances. Despite advancements in latent semantic analysis (LSA) and deep learning, achieving comprehensive semantic…

Information Retrieval · Computer Science 2024-09-27 Solmaz Seyed Monir , Irene Lau , Shubing Yang , Dongfang Zhao

LEMUR Neural Network Dataset: Towards Seamless AutoML

Neural networks are the backbone of modern artificial intelligence, but designing, evaluating, and comparing them remains labor-intensive. While numerous datasets exist for training, there are few standardized collections of the models…

Machine Learning · Computer Science 2025-09-25 Arash Torabi Goodarzi , Roman Kochnev , Waleed Khalid , Hojjat Torabi Goudarzi , Furui Qin , Tolgay Atinc Uzun , Yashkumar Sanjaybhai Dhameliya , Yash Kanubhai Kathiriya , Zofia Antonina Bentyn , Dmitry Ignatov , Radu Timofte

ColBERT-Att: Late-Interaction Meets Attention for Enhanced Retrieval

Vector embeddings from pre-trained language models form a core component in Neural Information Retrieval systems across a multitude of knowledge extraction tasks. The paradigm of late interaction, introduced in ColBERT, demonstrates high…

Information Retrieval · Computer Science 2026-03-27 Raj Nath Patel , Sourav Dutta

Efficient Constant-Space Multi-Vector Retrieval

Multi-vector retrieval methods, exemplified by the ColBERT architecture, have shown substantial promise for retrieval by providing strong trade-offs in terms of retrieval latency and effectiveness. However, they come at a high cost in terms…

Information Retrieval · Computer Science 2025-04-03 Sean MacAvaney , Antonio Mallia , Nicola Tonellotto

MINER: Mining Multimodal Internal Representation for Efficient Retrieval

Visual document retrieval has become essential for accessing information in visually rich documents. Existing approaches fall into two camps. Late-interaction retrievers achieve strong quality through fine-grained token-level matching but…

Machine Learning · Computer Science 2026-05-08 Weien Li , Rui Song , Zeyu Li , Haochen Liu , Gonghao Zhang , Difan Jiao , Zhenwei Tang , Bowei He , Haolun Wu , Xue Liu , Ye Yuan

MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs

State-of-the-art retrieval models typically address a straightforward search scenario, in which retrieval tasks are fixed (e.g., finding a passage to answer a specific question) and only a single modality is supported for both queries and…

Computation and Language · Computer Science 2025-02-25 Sheng-Chieh Lin , Chankyu Lee , Mohammad Shoeybi , Jimmy Lin , Bryan Catanzaro , Wei Ping

NextLevelBERT: Masked Language Modeling with Higher-Level Representations for Long Documents

While (large) language models have significantly improved over the last years, they still struggle to sensibly process long sequences found, e.g., in books, due to the quadratic scaling of the underlying attention mechanism. To address…

Computation and Language · Computer Science 2024-06-14 Tamara Czinczoll , Christoph Hönes , Maximilian Schall , Gerard de Melo