Related papers: Improving BERT-based Query-by-Document Retrieval w…

QBD-RankedDataGen: Generating Custom Ranked Datasets for Improving Query-By-Document Search Using LLM-Reranking with Reduced Human Effort

The Query-By-Document (QBD) problem is an information retrieval problem where the query is a document, and the retrieved candidates are documents that match the query document, often in a domain or query specific manner. This can be crucial…

Information Retrieval · Computer Science 2025-05-09 Sriram Gopalakrishnan , Sunandita Patra

Co-BERT: A Context-Aware BERT Retrieval Model Incorporating Local and Query-specific Context

BERT-based text ranking models have dramatically advanced the state-of-the-art in ad-hoc retrieval, wherein most models tend to consider individual query-document pairs independently. In the mean time, the importance and usefulness to…

Information Retrieval · Computer Science 2021-04-20 Xiaoyang Chen , Kai Hui , Ben He , Xianpei Han , Le Sun , Zheng Ye

Retrieval for Extremely Long Queries and Documents with RPRS: a Highly Efficient and Effective Transformer-based Re-Ranker

Retrieval with extremely long queries and documents is a well-known and challenging task in information retrieval and is commonly known as Query-by-Document (QBD) retrieval. Specifically designed Transformer models that can handle long…

Information Retrieval · Computer Science 2023-11-03 Arian Askari , Suzan Verberne , Amin Abolghasemi , Wessel Kraaij , Gabriella Pasi

BERT-QE: Contextualized Query Expansion for Document Re-ranking

Query expansion aims to mitigate the mismatch between the language used in a query and in a document. However, query expansion methods can suffer from introducing non-relevant information when expanding the query. To bridge this gap,…

Information Retrieval · Computer Science 2020-11-04 Zhi Zheng , Kai Hui , Ben He , Xianpei Han , Le Sun , Andrew Yates

An Analysis of a BERT Deep Learning Strategy on a Technology Assisted Review Task

Document screening is a central task within Evidenced Based Medicine, which is a clinical discipline that supplements scientific proof to back medical decisions. Given the recent advances in DL (Deep Learning) methods applied to Information…

Information Retrieval · Computer Science 2021-04-20 Alexandros Ioannidis

Revisiting Semantic Representation and Tree Search for Similar Question Retrieval

This paper studies the performances of BERT combined with tree structure in short sentence ranking task. In retrieval-based question answering system, we retrieve the most similar question of the query question by ranking all the questions…

Computation and Language · Computer Science 2019-09-09 Tong Guo , Huilin Gao

Learning-to-Rank with BERT in TF-Ranking

This paper describes a machine learning algorithm for document (re)ranking, in which queries and documents are firstly encoded using BERT [1], and on top of that a learning-to-rank (LTR) model constructed with TF-Ranking (TFR) [2] is…

Information Retrieval · Computer Science 2020-06-11 Shuguang Han , Xuanhui Wang , Mike Bendersky , Marc Najork

Simplified TinyBERT: Knowledge Distillation for Document Retrieval

Despite the effectiveness of utilizing the BERT model for document ranking, the high computational cost of such approaches limits their uses. To this end, this paper first empirically investigates the effectiveness of two knowledge…

Information Retrieval · Computer Science 2023-05-05 Xuanang Chen , Ben He , Kai Hui , Le Sun , Yingfei Sun

BERT-Embedding and Citation Network Analysis based Query Expansion Technique for Scholarly Search

The enormous growth of research publications has made it challenging for academic search engines to bring the most relevant papers against the given search query. Numerous solutions have been proposed over the years to improve the…

Information Retrieval · Computer Science 2023-01-27 Shah Khalid , Shah Khusro , Aftab Alam , Abdul Wahid

Pretrained Transformers for Text Ranking: BERT and Beyond

The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural…

Information Retrieval · Computer Science 2021-08-20 Jimmy Lin , Rodrigo Nogueira , Andrew Yates

Query Embedding Pruning for Dense Retrieval

Recent advances in dense retrieval techniques have offered the promise of being able not just to re-rank documents using contextualised language models such as BERT, but also to use such models to identify documents from the collection in…

Information Retrieval · Computer Science 2021-08-25 Nicola Tonellotto , Craig Macdonald

Document Expansion by Query Prediction

One technique to improve the retrieval effectiveness of a search engine is to expand documents with terms that are related or representative of the documents' content.From the perspective of a question answering system, this might comprise…

Information Retrieval · Computer Science 2019-09-26 Rodrigo Nogueira , Wei Yang , Jimmy Lin , Kyunghyun Cho

Task-Oriented Query Reformulation with Reinforcement Learning

Search engines play an important role in our everyday lives by assisting us in finding the information we need. When we input a complex query, however, results are often far from satisfactory. In this work, we introduce a query…

Information Retrieval · Computer Science 2017-09-26 Rodrigo Nogueira , Kyunghyun Cho

Groupwise Query Performance Prediction with BERT

While large-scale pre-trained language models like BERT have advanced the state-of-the-art in IR, its application in query performance prediction (QPP) is so far based on pointwise modeling of individual queries. Meanwhile, recent studies…

Information Retrieval · Computer Science 2022-04-26 Xiaoyang Chen , Ben He , Le Sun

Efficient Document Retrieval by End-to-End Refining and Quantizing BERT Embedding with Contrastive Product Quantization

Efficient document retrieval heavily relies on the technique of semantic hashing, which learns a binary code for every document and employs Hamming distance to evaluate document distances. However, existing semantic hashing methods are…

Information Retrieval · Computer Science 2022-11-01 Zexuan Qiu , Qinliang Su , Jianxing Yu , Shijing Si

Understanding the Behaviors of BERT in Ranking

This paper studies the performances and behaviors of BERT in ranking tasks. We explore several different ways to leverage the pre-trained BERT and fine-tune it on two ranking tasks: MS MARCO passage reranking and TREC Web Track ad hoc…

Information Retrieval · Computer Science 2019-04-29 Yifan Qiao , Chenyan Xiong , Zhenghao Liu , Zhiyuan Liu

Improving Low-Resource Cross-lingual Document Retrieval by Reranking with Deep Bilingual Representations

In this paper, we propose to boost low-resource cross-lingual document retrieval performance with deep bilingual query-document representations. We match queries and documents in both source and target languages with four components, each…

Computation and Language · Computer Science 2019-06-11 Rui Zhang , Caitlin Westerfield , Sungrok Shim , Garrett Bingham , Alexander Fabbri , Neha Verma , William Hu , Dragomir Radev

Deep Reinforced Query Reformulation for Information Retrieval

Query reformulations have long been a key mechanism to alleviate the vocabulary-mismatch problem in information retrieval, for example by expanding the queries with related query terms or by generating paraphrases of the queries. In this…

Information Retrieval · Computer Science 2020-07-17 Xiao Wang , Craig Macdonald , Iadh Ounis

BERT Rankers are Brittle: a Study using Adversarial Document Perturbations

Contextual ranking models based on BERT are now well established for a wide range of passage and document ranking tasks. However, the robustness of BERT-based ranking models under adversarial inputs is under-explored. In this paper, we…

Information Retrieval · Computer Science 2022-06-24 Yumeng Wang , Lijun Lyu , Avishek Anand

Cross-Lingual Relevance Transfer for Document Retrieval

Recent work has shown the surprising ability of multi-lingual BERT to serve as a zero-shot cross-lingual transfer model for a number of language processing tasks. We combine this finding with a similarly-recently proposal on sentence-level…

Information Retrieval · Computer Science 2019-11-11 Peng Shi , Jimmy Lin