Related papers: Deep Learning Based Dense Retrieval: A Comparative…

Unsupervised Dense Retrieval with Relevance-Aware Contrastive Pre-Training

Dense retrievers have achieved impressive performance, but their demand for abundant training data limits their application scenarios. Contrastive pre-training, which constructs pseudo-positive examples from unlabeled data, has shown great…

Information Retrieval · Computer Science 2023-06-07 Yibin Lei , Liang Ding , Yu Cao , Changtong Zan , Andrew Yates , Dacheng Tao

On the Robustness of LLM-Based Dense Retrievers: A Systematic Analysis of Generalizability and Stability

Decoder-only large language models (LLMs) are increasingly replacing BERT-style architectures as the backbone for dense retrieval, achieving substantial performance gains and broad adoption. However, the robustness of these LLM-based…

Information Retrieval · Computer Science 2026-04-21 Yongkang Li , Panagiotis Eustratiadis , Yixing Fan , Evangelos Kanoulas

Predicting Efficiency/Effectiveness Trade-offs for Dense vs. Sparse Retrieval Strategy Selection

Over the last few years, contextualized pre-trained transformer models such as BERT have provided substantial improvements on information retrieval tasks. Recent approaches based on pre-trained transformer models such as BERT, fine-tune…

Information Retrieval · Computer Science 2021-09-23 Negar Arabzadeh , Xinyi Yan , Charles L. A. Clarke

Poisoning Retrieval Corpora by Injecting Adversarial Passages

Dense retrievers have achieved state-of-the-art performance in various information retrieval tasks, but to what extent can they be safely deployed in real-world applications? In this work, we propose a novel attack for dense retrieval…

Computation and Language · Computer Science 2023-10-31 Zexuan Zhong , Ziqing Huang , Alexander Wettig , Danqi Chen

Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback

Dense retrieval systems conduct first-stage retrieval using embedded representations and simple similarity metrics to match a query to documents. Its effectiveness depends on encoded embeddings to capture the semantics of queries and…

Information Retrieval · Computer Science 2021-09-01 HongChien Yu , Chenyan Xiong , Jamie Callan

Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval

Recent research demonstrates the effectiveness of using fine-tuned language models~(LM) for dense retrieval. However, dense retrievers are hard to train, typically requiring heavily engineered fine-tuning pipelines to realize their full…

Information Retrieval · Computer Science 2021-08-13 Luyu Gao , Jamie Callan

Backdoor Attacks on Dense Retrieval via Public and Unintentional Triggers

Dense retrieval systems have been widely used in various NLP applications. However, their vulnerabilities to potential attacks have been underexplored. This paper investigates a novel attack scenario where the attackers aim to mislead the…

Computation and Language · Computer Science 2025-08-26 Quanyu Long , Yue Deng , LeiLei Gan , Wenya Wang , Sinno Jialin Pan

Improving Document Retrieval Coherence for Semantically Equivalent Queries

Dense Retrieval (DR) models have proven to be effective for Document Retrieval and Information Grounding tasks. Usually, these models are trained and optimized for improving the relevance of top-ranked documents for a given query. Previous…

Information Retrieval · Computer Science 2025-08-12 Stefano Campese , Alessandro Moschitti , Ivano Lauriola

Pick your Poison: Undetectability versus Robustness in Data Poisoning Attacks

Deep image classification models trained on vast amounts of web-scraped data are susceptible to data poisoning - a mechanism for backdooring models. A small number of poisoned samples seen during training can severely undermine a model's…

Cryptography and Security · Computer Science 2023-06-30 Nils Lukas , Florian Kerschbaum

Learning To Retrieve: How to Train a Dense Retrieval Model Effectively and Efficiently

Ranking has always been one of the top concerns in information retrieval research. For decades, lexical matching signal has dominated the ad-hoc retrieval process, but it also has inherent defects, such as the vocabulary mismatch problem.…

Information Retrieval · Computer Science 2020-10-21 Jingtao Zhan , Jiaxin Mao , Yiqun Liu , Min Zhang , Shaoping Ma

Mitigating the Impact of False Negatives in Dense Retrieval with Contrastive Confidence Regularization

In open-domain Question Answering (QA), dense retrieval is crucial for finding relevant passages for answer generation. Typically, contrastive learning is used to train a retrieval model that maps passages and queries to the same semantic…

Computation and Language · Computer Science 2024-01-17 Shiqi Wang , Yeqin Zhang , Cam-Tu Nguyen

Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval

Conducting text retrieval in a dense learned representation space has many intriguing advantages over sparse retrieval. Yet the effectiveness of dense retrieval (DR) often requires combination with sparse retrieval. In this paper, we…

Information Retrieval · Computer Science 2020-10-22 Lee Xiong , Chenyan Xiong , Ye Li , Kwok-Fung Tang , Jialin Liu , Paul Bennett , Junaid Ahmed , Arnold Overwijk

Unsupervised dense retrieval with conterfactual contrastive learning

Efficiently retrieving a concise set of candidates from a large document corpus remains a pivotal challenge in Information Retrieval (IR). Neural retrieval models, particularly dense retrieval models built with transformers and pretrained…

Information Retrieval · Computer Science 2025-01-01 Haitian Chen , Qingyao Ai , Xiao Wang , Yiqun Liu , Fen Lin , Qin Liu

Unsupervised Dense Information Retrieval with Contrastive Learning

Recently, information retrieval has seen the emergence of dense retrievers, using neural networks, as an alternative to classical sparse methods based on term-frequency. These models have obtained state-of-the-art results on datasets and…

Information Retrieval · Computer Science 2022-08-30 Gautier Izacard , Mathilde Caron , Lucas Hosseini , Sebastian Riedel , Piotr Bojanowski , Armand Joulin , Edouard Grave

Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback: A Reproducibility Study

Pseudo-Relevance Feedback (PRF) utilises the relevance signals from the top-k passages from the first round of retrieval to perform a second round of retrieval aiming to improve search effectiveness. A recent research direction has been the…

Information Retrieval · Computer Science 2023-03-22 Hang Li , Shengyao Zhuang , Ahmed Mourad , Xueguang Ma , Jimmy Lin , Guido Zuccon

CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos

Current dense retrievers are not robust to out-of-domain and outlier queries, i.e. their effectiveness on these queries is much poorer than what one would expect. In this paper, we consider a specific instance of such queries: queries that…

Information Retrieval · Computer Science 2022-04-27 Shengyao Zhuang , Guido Zuccon

Pre-training vs. Fine-tuning: A Reproducibility Study on Dense Retrieval Knowledge Acquisition

Dense retrievers utilize pre-trained backbone language models (e.g., BERT, LLaMA) that are fine-tuned via contrastive learning to perform the task of encoding text into sense representations that can be then compared via a shallow…

Information Retrieval · Computer Science 2025-05-13 Zheng Yao , Shuai Wang , Guido Zuccon

Dynamic Injection of Entity Knowledge into Dense Retrievers

Dense retrievers often struggle with queries involving less-frequent entities due to their limited entity knowledge. We propose the Knowledgeable Passage Retriever (KPR), a BERT-based retriever enhanced with a context-entity attention layer…

Computation and Language · Computer Science 2025-09-09 Ikuya Yamada , Ryokan Ri , Takeshi Kojima , Yusuke Iwasawa , Yutaka Matsuo

Sparse DNNs with Improved Adversarial Robustness

Deep neural networks (DNNs) are computationally/memory-intensive and vulnerable to adversarial attacks, making them prohibitive in some real-world applications. By converting dense models into sparse ones, pruning appears to be a promising…

Machine Learning · Computer Science 2019-11-07 Yiwen Guo , Chao Zhang , Changshui Zhang , Yurong Chen

MultiContrievers: Analysis of Dense Retrieval Representations

Dense retrievers compress source documents into (possibly lossy) vector representations, yet there is little analysis of what information is lost versus preserved, and how it affects downstream tasks. We conduct the first analysis of the…

Computation and Language · Computer Science 2024-10-07 Seraphina Goldfarb-Tarrant , Pedro Rodriguez , Jane Dwivedi-Yu , Patrick Lewis