English
Related papers

Related papers: CoRECT: A Framework for Evaluating Embedding Compr…

200 papers

Recent advances in dense retrieval techniques have offered the promise of being able not just to re-rank documents using contextualised language models such as BERT, but also to use such models to identify documents from the collection in…

Information Retrieval · Computer Science 2021-08-25 Nicola Tonellotto , Craig Macdonald

Learnable embedding vector is one of the most important applications in machine learning, and is widely used in various database-related domains. However, the high dimensionality of sparse data in recommendation tasks and the huge volume of…

Machine Learning · Computer Science 2024-02-14 Hailin Zhang , Penghao Zhao , Xupeng Miao , Yingxia Shao , Zirui Liu , Tong Yang , Bin Cui

Industry-scale recommender systems face a core challenge: representing entities with high cardinality, such as users or items, using dense embeddings that must be accessible during both training and inference. However, as embedding sizes…

Information Retrieval · Computer Science 2025-05-19 Petr Kasalický , Martin Spišák , Vojtěch Vančura , Daniel Bohuněk , Rodrigo Alves , Pavel Kordík

Text embedding models enable semantic search, powering several NLP applications like Retrieval Augmented Generation by efficient information retrieval (IR). However, text embedding models are commonly studied in scenarios where the training…

Information Retrieval · Computer Science 2025-10-07 Dipam Goswami , Liying Wang , Bartłomiej Twardowski , Joost van de Weijer

Dense Retrieval (DR) has achieved state-of-the-art first-stage ranking effectiveness. However, the efficiency of most existing DR models is limited by the large memory cost of storing dense vectors and the time-consuming nearest neighbor…

Information Retrieval · Computer Science 2021-10-13 Jingtao Zhan , Jiaxin Mao , Yiqun Liu , Jiafeng Guo , Min Zhang , Shaoping Ma

Contrastive learning has been the dominant approach to training dense retrieval models. In this work, we investigate the impact of ranking context - an often overlooked aspect of learning dense retrieval models. In particular, we examine…

Information Retrieval · Computer Science 2023-10-24 George Zerveas , Navid Rekabsaz , Daniel Cohen , Carsten Eickhoff

Finding desired information from large data set is a difficult problem. Information retrieval is concerned with the structure, analysis, organization, storage, searching, and retrieval of information. Index is the main constituent of an IR…

Information Retrieval · Computer Science 2012-09-26 Md. Abdullah al Mamun , Md. Hanif , Md. Rakib Uddin , Tanvir Ahmed , Md. Mofizul Islam

We present the first large-scale, cross-domain evaluation of document chunking strategies for dense retrieval, addressing a critical but underexplored aspect of retrieval-augmented systems. In our study, 36 segmentation methods spanning…

Computation and Language · Computer Science 2026-03-10 Muhammad Arslan Shaukat , Muntasir Adnan , Carlos C. N. Kuhn

A wide range of optimization problems arising in machine learning can be solved by gradient descent algorithms, and a central question in this area is how to efficiently compress a large-scale dataset so as to reduce the computational…

Machine Learning · Computer Science 2022-10-11 Jiawei Huang , Ruomin Huang , Wenjie Liu , Nikolaos M. Freris , Hu Ding

Dense retrievers powered by pretrained embeddings are widely used for document retrieval but struggle in specialized domains due to the mismatches between the training and target domain distributions. Domain adaptation typically requires…

Information Retrieval · Computer Science 2026-01-21 Chunsheng Zuo , Daniel Khashabi

In this paper, we introduce CoRet, a dense retrieval model designed for code-editing tasks that integrates code semantics, repository structure, and call graph dependencies. The model focuses on retrieving relevant portions of a code…

Machine Learning · Computer Science 2025-06-02 Fabio Fehr , Prabhu Teja Sivaprasad , Luca Franceschi , Giovanni Zappella

Dense retrieval models have become a standard for state-of-the-art information retrieval. However, their high-dimensional, high-precision (float32) vector embeddings create significant storage and memory challenges for real-world…

Information Retrieval · Computer Science 2025-11-19 Satyanarayan Pati

Information retrieval involves selecting artifacts from a corpus that are most relevant to a given search query. The flavor of retrieval typically used in classical applications can be termed as homogeneous and relaxed, where queries and…

Information Retrieval · Computer Science 2023-10-10 Anirudh Khatry , Yasharth Bajpai , Priyanshu Gupta , Sumit Gulwani , Ashish Tiwari

Embedding models are central to dense retrieval, semantic search, and recommendation systems, but their size often makes them impractical to deploy in resource-constrained environments such as browsers or edge devices. While smaller…

Ranking has always been one of the top concerns in information retrieval research. For decades, lexical matching signal has dominated the ad-hoc retrieval process, but it also has inherent defects, such as the vocabulary mismatch problem.…

Information Retrieval · Computer Science 2020-10-21 Jingtao Zhan , Jiaxin Mao , Yiqun Liu , Min Zhang , Shaoping Ma

Dense retrieval, which encodes queries and documents into a single dense vector, has become the dominant neural retrieval approach due to its simplicity and compatibility with fast approximate nearest neighbor algorithms. As the tasks dense…

Information Retrieval · Computer Science 2026-02-06 Julian Killingback , Mahta Rafiee , Madine Manas , Hamed Zamani

While multi-vector retrieval models outperform single-vector models of comparable size in retrieval quality, their practicality is limited by substantially larger index sizes, driven by the additional sequence-length dimension in their…

Information Retrieval · Computer Science 2026-03-25 Rohan Jha , Chunsheng Zuo , Reno Kriz , Benjamin Van Durme

The success of contextual word representations and advances in neural information retrieval have made dense vector-based retrieval a standard approach for passage and document ranking. While effective and efficient, dual-encoders are…

Information Retrieval · Computer Science 2023-04-11 Daniel Campos , ChengXiang Zhai , Alessandro Magnani

We investigate improving the retrieval effectiveness of embedding models through the lens of corpus-specific fine-tuning. Prior work has shown that fine-tuning with queries generated using a dataset's retrieval corpus can boost retrieval…

Information Retrieval · Computer Science 2025-05-27 Manveer Singh Tamber , Suleman Kazi , Vivek Sourabh , Jimmy Lin

Dense retrievers encode queries and documents and map them in an embedding space using pre-trained language models. These embeddings need to be high-dimensional to fit training signals and guarantee the retrieval effectiveness of dense…

Information Retrieval · Computer Science 2022-10-25 Zhenghao Liu , Han Zhang , Chenyan Xiong , Zhiyuan Liu , Yu Gu , Xiaohua Li
‹ Prev 1 2 3 10 Next ›