Related papers: The Curse of Dense Low-Dimensional Information Ret…

Searching Dense Representations with Inverted Indexes

Nearly all implementations of top-$k$ retrieval with dense vector representations today take advantage of hierarchical navigable small-world network (HNSW) indexes. However, the generation of vector representations and efficiently searching…

Information Retrieval · Computer Science 2023-12-05 Jimmy Lin , Tommaso Teofili

Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence

Dense retrieval models are commonly used in Information Retrieval (IR) applications, such as Retrieval-Augmented Generation (RAG). Since they often serve as the first step in these systems, their robustness is critical to avoid downstream…

Computation and Language · Computer Science 2025-06-04 Mohsen Fayyaz , Ali Modarressi , Hinrich Schuetze , Nanyun Peng

CSPLADE: Learned Sparse Retrieval with Causal Language Models

In recent years, dense retrieval has been the focus of information retrieval (IR) research. While effective, dense retrieval produces uninterpretable dense vectors, and suffers from the drawback of large index size. Learned sparse retrieval…

Information Retrieval · Computer Science 2025-11-10 Zhichao Xu , Aosong Feng , Yijun Tian , Haibo Ding , Lin Lee Cheong

Ultra-High Dimensional Sparse Representations with Binarization for Efficient Text Retrieval

The semantic matching capabilities of neural information retrieval can ameliorate synonymy and polysemy problems of symbolic approaches. However, neural models' dense representations are more suitable for re-ranking, due to their…

Computation and Language · Computer Science 2021-10-18 Kyoung-Rok Jang , Junmo Kang , Giwon Hong , Sung-Hyon Myaeng , Joohee Park , Taewon Yoon , Heecheol Seo

The Future is Sparse: Embedding Compression for Scalable Retrieval in Recommender Systems

Industry-scale recommender systems face a core challenge: representing entities with high cardinality, such as users or items, using dense embeddings that must be accessible during both training and inference. However, as embedding sizes…

Information Retrieval · Computer Science 2025-05-19 Petr Kasalický , Martin Spišák , Vojtěch Vančura , Daniel Bohuněk , Rodrigo Alves , Pavel Kordík

Dense Passage Retrieval for Open-Domain Question Answering

Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. In this work, we show that retrieval can be…

Computation and Language · Computer Science 2020-10-02 Vladimir Karpukhin , Barlas Oğuz , Sewon Min , Patrick Lewis , Ledell Wu , Sergey Edunov , Danqi Chen , Wen-tau Yih

Information Retrieval with Entity Linking

Despite the advantages of their low-resource settings, traditional sparse retrievers depend on exact matching approaches between high-dimensional bag-of-words (BoW) representations of both the queries and the collection. As a result,…

Information Retrieval · Computer Science 2024-04-16 Dahlia Shehata

Densifying Sparse Representations for Passage Retrieval by Representational Slicing

Learned sparse and dense representations capture different successful approaches to text retrieval and the fusion of their results has proven to be more effective and robust. Prior work combines dense and sparse retrievers by fusing their…

Information Retrieval · Computer Science 2021-12-10 Sheng-Chieh Lin , Jimmy Lin

Answering Multimodal Exclusion Queries with Lightweight Sparse Disentangled Representations

Multimodal representations that enable cross-modal retrieval are widely used. However, these often lack interpretability making it difficult to explain the retrieved results. Solutions such as learning sparse disentangled representations…

Information Retrieval · Computer Science 2025-06-25 Prachi J , Sumit Bhatia , Srikanta Bedathur

Statistical Foundations of DIME: Risk Estimation for Practical Index Selection

High-dimensional dense embeddings have become central to modern Information Retrieval, but many dimensions are noisy or redundant. Recently proposed DIME (Dimension IMportance Estimation), provides query-dependent scores to identify…

Information Retrieval · Computer Science 2026-04-13 Giulio D'Erasmo , Cesare Campagnano , Antonio Mallia , Pierpaolo Brutti , Nicola Tonellotto , Fabrizio Silvestri

Improving Document Representations by Generating Pseudo Query Embeddings for Dense Retrieval

Recently, the retrieval models based on dense representations have been gradually applied in the first stage of the document retrieval tasks, showing better performance than traditional sparse vector space models. To obtain high efficiency,…

Information Retrieval · Computer Science 2021-08-20 Hongyin Tang , Xingwu Sun , Beihong Jin , Jingang Wang , Fuzheng Zhang , Wei Wu

Interpretable Neural Embeddings with Sparse Self-Representation

Interpretability benefits the theoretical understanding of representations. Existing word embeddings are generally dense representations. Hence, the meaning of latent dimensions is difficult to interpret. This makes word embeddings like a…

Computation and Language · Computer Science 2023-06-27 Minxue Xia , Hao Zhu

Scaling Laws for Embedding Dimension in Information Retrieval

Dense retrieval, which encodes queries and documents into a single dense vector, has become the dominant neural retrieval approach due to its simplicity and compatibility with fast approximate nearest neighbor algorithms. As the tasks dense…

Information Retrieval · Computer Science 2026-02-06 Julian Killingback , Mahta Rafiee , Madine Manas , Hamed Zamani

Faster Learned Sparse Retrieval with Guided Traversal

Neural information retrieval architectures based on transformers such as BERT are able to significantly improve system effectiveness over traditional sparse models such as BM25. Though highly effective, these neural approaches are very…

Information Retrieval · Computer Science 2022-04-26 Antonio Mallia , Joel Mackenzie , Torsten Suel , Nicola Tonellotto

Explain like I am BM25: Interpreting a Dense Model's Ranked-List with a Sparse Approximation

Neural retrieval models (NRMs) have been shown to outperform their statistical counterparts owing to their ability to capture semantic meaning via dense document representations. These models, however, suffer from poor interpretability as…

Information Retrieval · Computer Science 2023-04-26 Michael Llordes , Debasis Ganguly , Sumit Bhatia , Chirag Agarwal

Generative Retrieval Overcomes Limitations of Dense Retrieval but Struggles with Identifier Ambiguity

While dense retrieval models, which embed queries and documents into a shared low-dimensional space, have gained widespread popularity, they were shown to exhibit important theoretical limitations and considerably lag behind traditional…

Information Retrieval · Computer Science 2026-04-09 Adrian Bracher , Svitlana Vakulenko

An Empirical Study of Position Bias in Modern Information Retrieval

This study investigates the position bias in information retrieval, where models tend to overemphasize content at the beginning of passages while neglecting semantically relevant information that appears later. To analyze the extent and…

Information Retrieval · Computer Science 2025-09-19 Ziyang Zeng , Dun Zhang , Jiacheng Li , Panxiang Zou , Yudong Zhou , Yuqing Yang

Sparse, Dense, and Attentional Representations for Text Retrieval

Dual encoders perform retrieval by encoding documents and queries into dense lowdimensional vectors, scoring each document by its inner product with the query. We investigate the capacity of this architecture relative to sparse bag-of-words…

Computation and Language · Computer Science 2021-02-18 Yi Luan , Jacob Eisenstein , Kristina Toutanova , Michael Collins

ECLIPSE: Contrastive Dimension Importance Estimation with Pseudo-Irrelevance Feedback for Dense Retrieval

Recent advances in Information Retrieval have leveraged high-dimensional embedding spaces to improve the retrieval of relevant documents. Moreover, the Manifold Clustering Hypothesis suggests that despite these high-dimensional…

Information Retrieval · Computer Science 2024-12-20 Giulio D'Erasmo , Giovanni Trappolini , Nicola Tonellotto , Fabrizio Silvestri

Learning To Retrieve: How to Train a Dense Retrieval Model Effectively and Efficiently

Ranking has always been one of the top concerns in information retrieval research. For decades, lexical matching signal has dominated the ad-hoc retrieval process, but it also has inherent defects, such as the vocabulary mismatch problem.…

Information Retrieval · Computer Science 2020-10-21 Jingtao Zhan , Jiaxin Mao , Yiqun Liu , Min Zhang , Shaoping Ma