English
Related papers

Related papers: Multi-Vector Retrieval as Sparse Alignment

200 papers

Learned multivector representations power modern search systems with strong retrieval effectiveness, but their real-world use is limited by the high cost of exhaustive token-level retrieval. Therefore, most systems adopt a…

Information Retrieval · Computer Science 2026-01-19 Silvio Martinico , Franco Maria Nardini , Cosimo Rulli , Rossano Venturini

Sparse document representations have been widely used to retrieve relevant documents via exact lexical matching. Owing to the pre-computed inverted index, it supports fast ad-hoc search but incurs the vocabulary mismatch problem. Although…

Information Retrieval · Computer Science 2023-10-06 Eunseong Choi , Sunkyung Lee , Minjin Choi , Hyeseon Ko , Young-In Song , Jongwuk Lee

We study efficient multi-vector retrieval for late interaction in any modality. Late interaction has emerged as a dominant paradigm for information retrieval in text, images, visual documents, and videos, but its computation and storage…

Information Retrieval · Computer Science 2026-02-25 Hanxiang Qin , Alexander Martin , Rohan Jha , Chunsheng Zuo , Reno Kriz , Benjamin Van Durme

With the increasing accessibility and utilization of multilingual documents, Cross-Lingual Information Retrieval (CLIR) has emerged as an important research area. Conventionally, CLIR tasks have been conducted under settings where the…

Information Retrieval · Computer Science 2026-04-08 Seongtae Hong , Youngjoon Jang , Jungseob Lee , Hyeonseok Moon , Heuiseok Lim

Text-Video Retrieval (TVR) methods typically match query-candidate pairs by aligning text and video features in coarse-grained, fine-grained, or combined (coarse-to-fine) manners. However, these frameworks predominantly employ a…

Computer Vision and Pattern Recognition · Computer Science 2024-10-01 Bingqing Zhang , Zhuo Cao , Heming Du , Xin Yu , Xue Li , Jiajun Liu , Sen Wang

Most text retrievers generate \emph{one} query vector to retrieve relevant documents. Yet, the conditional distribution of relevant documents for the query may be multimodal, e.g., representing different interpretations of the query. We…

Computation and Language · Computer Science 2025-11-05 Hung-Ting Chen , Xiang Liu , Shauli Ravfogel , Eunsol Choi

Pairwise re-ranking models predict which of two documents is more relevant to a query and then aggregate a final ranking from such preferences. This is often more effective than pointwise re-ranking models that directly predict a relevance…

Information Retrieval · Computer Science 2022-07-12 Lukas Gienapp , Maik Fröbe , Matthias Hagen , Martin Potthast

This paper addresses the problem of Approximate Nearest Neighbor (ANN) search in pattern recognition where feature vectors in a database are encoded as compact codes in order to speed-up the similarity search in large-scale databases.…

Information Theory · Computer Science 2017-04-26 Sohrab Ferdowsi , Slava Voloshynovskiy , Dimche Kostadinov , Taras Holotyak

Text alignment finds application in tasks such as citation recommendation and plagiarism detection. Existing alignment methods operate at a single, predefined level and cannot learn to align texts at, for example, sentence and document…

Computation and Language · Computer Science 2020-10-06 Xuhui Zhou , Nikolaos Pappas , Noah A. Smith

Visual document retrieval has become essential for accessing information in visually rich documents. Existing approaches fall into two camps. Late-interaction retrievers achieve strong quality through fine-grained token-level matching but…

Machine Learning · Computer Science 2026-05-08 Weien Li , Rui Song , Zeyu Li , Haochen Liu , Gonghao Zhang , Difan Jiao , Zhenwei Tang , Bowei He , Haolun Wu , Xue Liu , Ye Yuan

Multi-vector retrieval methods, exemplified by the ColBERT architecture, have shown substantial promise for retrieval by providing strong trade-offs in terms of retrieval latency and effectiveness. However, they come at a high cost in terms…

Information Retrieval · Computer Science 2025-04-03 Sean MacAvaney , Antonio Mallia , Nicola Tonellotto

In this work we present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks across a number of diverse language pairs. We first treat…

Computation and Language · Computer Science 2021-12-22 Robert Litschko , Ivan Vulić , Simone Paolo Ponzetto , Goran Glavaš

Multi-vector retrieval models such as ColBERT [Khattab and Zaharia, 2020] allow token-level interactions between queries and documents, and hence achieve state of the art on many information retrieval benchmarks. However, their non-linear…

Computation and Language · Computer Science 2024-04-10 Jinhyuk Lee , Zhuyun Dai , Sai Meher Karthik Duddu , Tao Lei , Iftekhar Naim , Ming-Wei Chang , Vincent Y. Zhao

Multimodal documents contain diverse elements, such as tables, figures, and layouts, which can complicate retrieval tasks. While current approaches typically combine dense visual embedding models with supervised rerankers to achieve…

Computer Vision and Pattern Recognition · Computer Science 2026-05-29 Ruofan Hu , Menghui Zhu , Jieming Zhu , Bo Chen , Shengyang Xu , Minjie Hong , Xiaoda Yang , Sashuai Zhou , Li Tang , Tao Jin , Zhou Zhao

While multi-vector retrieval models outperform single-vector models of comparable size in retrieval quality, their practicality is limited by substantially larger index sizes, driven by the additional sequence-length dimension in their…

Information Retrieval · Computer Science 2026-03-25 Rohan Jha , Chunsheng Zuo , Reno Kriz , Benjamin Van Durme

Aligning parallel sentences in multilingual corpora is essential to curating data for downstream applications such as Machine Translation. In this work, we present OneAligner, an alignment model specially designed for sentence retrieval…

Computation and Language · Computer Science 2022-05-19 Tong Niu , Kazuma Hashimoto , Yingbo Zhou , Caiming Xiong

In this paper, we investigate the problem of optimization multivariate performance measures, and propose a novel algorithm for it. Different from traditional machine learning methods which optimize simple loss functions to learn prediction…

Machine Learning · Computer Science 2015-08-03 Jiachen Yanga , Zhiyong Dinga , Fei Guoa , Huogen Wanga , Nick Hughesb

In complex visual recognition tasks it is typical to adopt multiple descriptors, that describe different aspects of the images, for obtaining an improved recognition performance. Descriptors that have diverse forms can be fused into a…

Computer Vision and Pattern Recognition · Computer Science 2015-06-15 Jayaraman J. Thiagarajan , Karthikeyan Natesan Ramamurthy , Andreas Spanias

Dual encoders perform retrieval by encoding documents and queries into dense lowdimensional vectors, scoring each document by its inner product with the query. We investigate the capacity of this architecture relative to sparse bag-of-words…

Computation and Language · Computer Science 2021-02-18 Yi Luan , Jacob Eisenstein , Kristina Toutanova , Michael Collins

Learned Sparse Retrieval (LSR) is an effective IR approach that exploits pre-trained language models for encoding text into a learned bag of words. Several efforts in the literature have shown that sparsity is key to enabling a good…

Information Retrieval · Computer Science 2025-05-06 Franco Maria Nardini , Thong Nguyen , Cosimo Rulli , Rossano Venturini , Andrew Yates
‹ Prev 1 2 3 10 Next ›