Related papers: DeeperImpact: Optimizing Sparse Learned Index Stru…

Learning Passage Impacts for Inverted Indexes

Neural information retrieval systems typically use a cascading pipeline, in which a first-stage model retrieves a candidate set of documents and one or more subsequent stages re-rank this set using contextualized language models such as…

Information Retrieval · Computer Science 2021-04-27 Antonio Mallia , Omar Khattab , Nicola Tonellotto , Torsten Suel

Faster Learned Sparse Retrieval with Guided Traversal

Neural information retrieval architectures based on transformers such as BERT are able to significantly improve system effectiveness over traditional sparse models such as BM25. Though highly effective, these neural approaches are very…

Information Retrieval · Computer Science 2022-04-26 Antonio Mallia , Joel Mackenzie , Torsten Suel , Nicola Tonellotto

SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval

In neural Information Retrieval (IR), ongoing research is directed towards improving the first retriever in ranking pipelines. Learning dense embeddings to conduct retrieval using efficient approximate nearest neighbors methods has proven…

Information Retrieval · Computer Science 2021-09-22 Thibault Formal , Carlos Lassance , Benjamin Piwowarski , Stéphane Clinchant

A Static Pruning Study on Sparse Neural Retrievers

Sparse neural retrievers, such as DeepImpact, uniCOIL and SPLADE, have been introduced recently as an efficient and effective way to perform retrieval with inverted indexes. They aim to learn term importance and, in some cases, document…

Information Retrieval · Computer Science 2023-04-26 Carlos Lassance , Simon Lupart , Hervé Dejean , Stéphane Clinchant , Nicola Tonellotto

From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective

Neural retrievers based on dense representations combined with Approximate Nearest Neighbors search have recently received a lot of attention, owing their success to distillation and/or better sampling of examples for training -- while…

Information Retrieval · Computer Science 2022-05-13 Thibault Formal , Carlos Lassance , Benjamin Piwowarski , Stéphane Clinchant

Forward Index Compression for Learned Sparse Retrieval

Text retrieval using learned sparse representations of queries and documents has, over the years, evolved into a highly effective approach to search. It is thanks to recent advances in approximate nearest neighbor search-with the emergence…

Information Retrieval · Computer Science 2026-02-06 Sebastian Bruch , Martino Fontana , Franco Maria Nardini , Cosimo Rulli , Rossano Venturini

Improved Learned Sparse Retrieval with Corpus-Specific Vocabularies

We explore leveraging corpus-specific vocabularies that improve both efficiency and effectiveness of learned sparse retrieval systems. We find that pre-training the underlying BERT model on the target corpus, specifically targeting…

Information Retrieval · Computer Science 2024-01-15 Puxuan Yu , Antonio Mallia , Matthias Petri

DynamicRetriever: A Pre-training Model-based IR System with Neither Sparse nor Dense Index

Web search provides a promising way for people to obtain information and has been extensively studied. With the surgence of deep learning and large-scale pre-training techniques, various neural information retrieval models are proposed and…

Information Retrieval · Computer Science 2022-03-02 Yujia Zhou , Jing Yao , Zhicheng Dou , Ledell Wu , Ji-Rong Wen

SpaDE: Improving Sparse Representations using a Dual Document Encoder for First-stage Retrieval

Sparse document representations have been widely used to retrieve relevant documents via exact lexical matching. Owing to the pre-computed inverted index, it supports fast ad-hoc search but incurs the vocabulary mismatch problem. Although…

Information Retrieval · Computer Science 2023-10-06 Eunseong Choi , Sunkyung Lee , Minjin Choi , Hyeseon Ko , Young-In Song , Jongwuk Lee

Sparse and Dense Retrievers Learn Better Together: Joint Sparse-Dense Optimization for Text-Image Retrieval

Vision-Language Pretrained (VLP) models have achieved impressive performance on multimodal tasks, including text-image retrieval, based on dense representations. Meanwhile, Learned Sparse Retrieval (LSR) has gained traction in text-only…

Computation and Language · Computer Science 2025-08-26 Jonghyun Song , Youngjune Lee , Gyu-Hwung Cho , Ilhyeon Song , Saehun Kim , Yohan Jo

SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking

In neural Information Retrieval, ongoing research is directed towards improving the first retriever in ranking pipelines. Learning dense embeddings to conduct retrieval using efficient approximate nearest neighbors methods has proven to…

Information Retrieval · Computer Science 2021-07-14 Thibault Formal , Benjamin Piwowarski , Stéphane Clinchant

Information Retrieval with Entity Linking

Despite the advantages of their low-resource settings, traditional sparse retrievers depend on exact matching approaches between high-dimensional bag-of-words (BoW) representations of both the queries and the collection. As a result,…

Information Retrieval · Computer Science 2024-04-16 Dahlia Shehata

The Role of Vocabularies in Learning Sparse Representations for Ranking

Learned Sparse Retrieval (LSR) such as SPLADE has growing interest for effective semantic 1st stage matching while enjoying the efficiency of inverted indices. A recent work on learning SPLADE models with expanded vocabularies (ESPLADE) was…

Information Retrieval · Computer Science 2026-04-21 Hiun Kim , Tae Kwan Lee , Taeryun Won

SPLATE: Sparse Late Interaction Retrieval

The late interaction paradigm introduced with ColBERT stands out in the neural Information Retrieval space, offering a compelling effectiveness-efficiency trade-off across many benchmarks. Efficient late interaction retrieval is based on an…

Information Retrieval · Computer Science 2024-04-23 Thibault Formal , Stéphane Clinchant , Hervé Déjean , Carlos Lassance

Sparse Meets Dense: A Hybrid Approach to Enhance Scientific Document Retrieval

Traditional information retrieval is based on sparse bag-of-words vector representations of documents and queries. More recent deep-learning approaches have used dense embeddings learned using a transformer-based large language model. We…

Information Retrieval · Computer Science 2024-01-09 Priyanka Mandikal , Raymond Mooney

Learning To Retrieve: How to Train a Dense Retrieval Model Effectively and Efficiently

Ranking has always been one of the top concerns in information retrieval research. For decades, lexical matching signal has dominated the ad-hoc retrieval process, but it also has inherent defects, such as the vocabulary mismatch problem.…

Information Retrieval · Computer Science 2020-10-21 Jingtao Zhan , Jiaxin Mao , Yiqun Liu , Min Zhang , Shaoping Ma

Towards Competitive Search Relevance For Inference-Free Learned Sparse Retrievers

Learned sparse retrieval, which can efficiently perform retrieval through mature inverted-index engines, has garnered growing attention in recent years. Particularly, the inference-free sparse retrievers are attractive as they eliminate…

Information Retrieval · Computer Science 2025-07-02 Zhichao Geng , Yiwen Wang , Dongyu Ru , Yang Yang

Efficient and Interpretable Information Retrieval for Product Question Answering with Heterogeneous Data

Expansion-enhanced sparse lexical representation improves information retrieval (IR) by minimizing vocabulary mismatch problems during lexical matching. In this paper, we explore the potential of jointly learning dense semantic…

Machine Learning · Computer Science 2024-05-24 Biplob Biswas , Rajiv Ramnath

IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse

Long-context agentic workflows have emerged as a defining use case for large language models, making attention efficiency critical for both inference speed and serving cost. Sparse attention addresses this challenge effectively, and…

Computation and Language · Computer Science 2026-03-13 Yushi Bai , Qian Dong , Ting Jiang , Xin Lv , Zhengxiao Du , Aohan Zeng , Jie Tang , Juanzi Li

Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations

Learned sparse representations form an attractive class of contextual embeddings for text retrieval. That is so because they are effective models of relevance and are interpretable by design. Despite their apparent compatibility with…

Information Retrieval · Computer Science 2024-07-15 Sebastian Bruch , Franco Maria Nardini , Cosimo Rulli , Rossano Venturini