Related papers: Multivariate Representation Learning for Informati…

Multi-View Document Representation Learning for Open-Domain Dense Retrieval

Dense retrieval has achieved impressive advances in first-stage retrieval from a large-scale document collection, which is built on bi-encoder architecture to produce single vector representation of query and document. However, a document…

Computation and Language · Computer Science 2022-03-17 Shunyu Zhang , Yaobo Liang , Ming Gong , Daxin Jiang , Nan Duan

Investigating Multi-layer Representations for Dense Passage Retrieval

Dense retrieval models usually adopt vectors from the last hidden layer of the document encoder to represent a document, which is in contrast to the fact that representations in different layers of a pre-trained language model usually…

Information Retrieval · Computer Science 2025-09-30 Zhongbin Xie , Thomas Lukasiewicz

Learning Diverse Document Representations with Deep Query Interactions for Dense Retrieval

In this paper, we propose a new dense retrieval model which learns diverse document representations with deep query interactions. Our model encodes each document with a set of generated pseudo-queries to get query-informed, multi-view…

Information Retrieval · Computer Science 2022-08-09 Zehan Li , Nan Yang , Liang Wang , Furu Wei

On the Value of Behavioral Representations for Dense Retrieval

We consider text retrieval within dense representational space in real-world settings such as e-commerce search where (a) document popularity and (b) diversity of queries associated with a document have a skewed distribution. Most of the…

Information Retrieval · Computer Science 2022-08-12 Nan Jiang , Dhivya Eswaran , Choon Hui Teo , Yexiang Xue , Yesh Dattatreya , Sujay Sanghavi , Vishy Vishwanathan

Improving Document Retrieval Coherence for Semantically Equivalent Queries

Dense Retrieval (DR) models have proven to be effective for Document Retrieval and Information Grounding tasks. Usually, these models are trained and optimized for improving the relevance of top-ranked documents for a given query. Previous…

Information Retrieval · Computer Science 2025-08-12 Stefano Campese , Alessandro Moschitti , Ivano Lauriola

CAPSTONE: Curriculum Sampling for Dense Retrieval with Document Expansion

The dual-encoder has become the de facto architecture for dense retrieval. Typically, it computes the latent representations of the query and document independently, thus failing to fully capture the interactions between the query and…

Computation and Language · Computer Science 2023-10-31 Xingwei He , Yeyun Gong , A-Long Jin , Hang Zhang , Anlei Dong , Jian Jiao , Siu Ming Yiu , Nan Duan

Improving Document Representations by Generating Pseudo Query Embeddings for Dense Retrieval

Recently, the retrieval models based on dense representations have been gradually applied in the first stage of the document retrieval tasks, showing better performance than traditional sparse vector space models. To obtain high efficiency,…

Information Retrieval · Computer Science 2021-08-20 Hongyin Tang , Xingwu Sun , Beihong Jin , Jingang Wang , Fuzheng Zhang , Wei Wu

Vector representations of text data in deep learning

In this dissertation we report results of our research on dense distributed representations of text data. We propose two novel neural models for learning such representations. The first model learns representations at the document level,…

Computation and Language · Computer Science 2019-01-08 Karol Grzegorczyk

A Contrastive Pre-training Approach to Learn Discriminative Autoencoder for Dense Retrieval

Dense retrieval (DR) has shown promising results in information retrieval. In essence, DR requires high-quality text representations to support effective search in the representation space. Recent studies have shown that pre-trained…

Information Retrieval · Computer Science 2022-08-23 Xinyu Ma , Ruqing Zhang , Jiafeng Guo , Yixing Fan , Xueqi Cheng

Learning To Retrieve: How to Train a Dense Retrieval Model Effectively and Efficiently

Ranking has always been one of the top concerns in information retrieval research. For decades, lexical matching signal has dominated the ad-hoc retrieval process, but it also has inherent defects, such as the vocabulary mismatch problem.…

Information Retrieval · Computer Science 2020-10-21 Jingtao Zhan , Jiaxin Mao , Yiqun Liu , Min Zhang , Shaoping Ma

Deep Metric Learning using Similarities from Nonlinear Rank Approximations

In recent years, deep metric learning has achieved promising results in learning high dimensional semantic feature embeddings where the spatial relationships of the feature vectors match the visual similarities of the images. Similarity…

Machine Learning · Computer Science 2019-09-25 Konstantin Schall , Kai Uwe Barthel , Nico Hezel , Klaus Jung

Learning Dense Representations of Phrases at Scale

Open-domain question answering can be reformulated as a phrase retrieval problem, without the need for processing documents on-demand during inference (Seo et al., 2019). However, current phrase retrieval models heavily depend on sparse…

Computation and Language · Computer Science 2021-06-03 Jinhyuk Lee , Mujeen Sung , Jaewoo Kang , Danqi Chen

Relevance-based Word Embedding

Learning a high-dimensional dense representation for vocabulary terms, also known as a word embedding, has recently attracted much attention in natural language processing and information retrieval tasks. The embedding vectors are typically…

Information Retrieval · Computer Science 2017-07-18 Hamed Zamani , W. Bruce Croft

More Robust Dense Retrieval with Contrastive Dual Learning

Dense retrieval conducts text retrieval in the embedding space and has shown many advantages compared to sparse retrieval. Existing dense retrievers optimize representations of queries and documents with contrastive training and map them to…

Information Retrieval · Computer Science 2021-07-19 Yizhi Li , Zhenghao Liu , Chenyan Xiong , Zhiyuan Liu

Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval

Dense Retrieval (DR) has achieved state-of-the-art first-stage ranking effectiveness. However, the efficiency of most existing DR models is limited by the large memory cost of storing dense vectors and the time-consuming nearest neighbor…

Information Retrieval · Computer Science 2021-10-13 Jingtao Zhan , Jiaxin Mao , Yiqun Liu , Jiafeng Guo , Min Zhang , Shaoping Ma

Unleashing the Power of LLMs in Dense Retrieval with Query Likelihood Modeling

Dense retrieval is a crucial task in Information Retrieval (IR), serving as the basis for downstream tasks such as re-ranking and augmenting generation. Recently, large language models (LLMs) have demonstrated impressive semantic…

Information Retrieval · Computer Science 2025-08-20 Hengran Zhang , Keping Bi , Jiafeng Guo , Xiaojie Sun , Shihao Liu , Daiting Shi , Dawei Yin , Xueqi Cheng

Learning to Match Using Local and Distributed Representations of Text for Web Search

Models such as latent semantic analysis and those based on neural embeddings learn distributed representations of text, and match the query against the document in the latent semantic space. In traditional information retrieval models, on…

Information Retrieval · Computer Science 2016-10-27 Bhaskar Mitra , Fernando Diaz , Nick Craswell

MultiContrievers: Analysis of Dense Retrieval Representations

Dense retrievers compress source documents into (possibly lossy) vector representations, yet there is little analysis of what information is lost versus preserved, and how it affects downstream tasks. We conduct the first analysis of the…

Computation and Language · Computer Science 2024-10-07 Seraphina Goldfarb-Tarrant , Pedro Rodriguez , Jane Dwivedi-Yu , Patrick Lewis

A Proposed Conceptual Framework for a Representational Approach to Information Retrieval

This paper outlines a conceptual framework for understanding recent developments in information retrieval and natural language processing that attempts to integrate dense and sparse retrieval methods. I propose a representational approach…

Information Retrieval · Computer Science 2021-12-30 Jimmy Lin

Typo-Robust Representation Learning for Dense Retrieval

Dense retrieval is a basic building block of information retrieval applications. One of the main challenges of dense retrieval in real-world settings is the handling of queries containing misspelled words. A popular approach for handling…

Information Retrieval · Computer Science 2023-06-21 Panuthep Tasawong , Wuttikorn Ponwitayarat , Peerat Limkonchotiwat , Can Udomcharoenchaikit , Ekapol Chuangsuwanich , Sarana Nutanong