Related papers: Efficient Document Re-Ranking for Transformers by …

SDR: Efficient Neural Re-ranking using Succinct Document Representation

BERT based ranking models have achieved superior performance on various information retrieval tasks. However, the large number of parameters and complex self-attention operation come at a significant latency overhead. To remedy this, recent…

Information Retrieval · Computer Science 2021-10-06 Nachshon Cohen , Amit Portnoy , Besnik Fetahu , Amir Ingber

Pretrained Transformers for Text Ranking: BERT and Beyond

The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural…

Information Retrieval · Computer Science 2021-08-20 Jimmy Lin , Rodrigo Nogueira , Andrew Yates

Layer-wise Token Compression for Efficient Document Reranking

Transformer-based document cross-encoder rerankers are a central component of modern information retrieval systems. Despite their success, these models suffer from high computational costs due to processing long query-document sequences at…

Information Retrieval · Computer Science 2026-05-22 Shengyao Zhuang , Zhichao Xu , Ivano Lauriola

Reranking with Compressed Document Representation

Reranking, the process of refining the output of a first-stage retriever, is often considered computationally expensive, especially with Large Language Models. Borrowing from recent advances in document compression for RAG, we reduce the…

Information Retrieval · Computer Science 2025-05-22 Hervé Déjean , Stéphane Clinchant

Efficient Listwise Reranking with Compressed Document Representations

Reranking, the process of refining the output from a first-stage retriever, is often considered computationally expensive, especially when using Large Language Models (LLMs). A common approach to mitigate this cost involves utilizing…

Information Retrieval · Computer Science 2026-04-30 Hervé Déjean , Stéphane Clinchant

Retrieval for Extremely Long Queries and Documents with RPRS: a Highly Efficient and Effective Transformer-based Re-Ranker

Retrieval with extremely long queries and documents is a well-known and challenging task in information retrieval and is commonly known as Query-by-Document (QBD) retrieval. Specifically designed Transformer models that can handle long…

Information Retrieval · Computer Science 2023-11-03 Arian Askari , Suzan Verberne , Amin Abolghasemi , Wessel Kraaij , Gabriella Pasi

Compact Token Representations with Contextual Quantization for Efficient Document Re-ranking

Transformer based re-ranking models can achieve high search relevance through context-aware soft matching of query tokens with document tokens. To alleviate runtime complexity of such inference, previous work has adopted a late interaction…

Information Retrieval · Computer Science 2022-03-30 Yingrui Yang , Yifan Qiao , Tao Yang

ProRank: Prompt Warmup via Reinforcement Learning for Small Language Models Reranking

Reranking is fundamental to information retrieval and retrieval-augmented generation, with recent Large Language Models (LLMs) significantly advancing reranking quality. Most current works rely on large-scale LLMs (>7B parameters),…

Information Retrieval · Computer Science 2026-04-17 Xianming Li , Aamir Shakir , Rui Huang , Tsz-fung Andrew Lee , Julius Lipp , Benjamin Clavié , Jing Li

How Different are Pre-trained Transformers for Text Ranking?

In recent years, large pre-trained transformers have led to substantial gains in performance over traditional retrieval models and feedback approaches. However, these results are primarily based on the MS Marco/TREC Deep Learning Track…

Information Retrieval · Computer Science 2022-04-18 David Rau , Jaap Kamps

Topic-Grained Text Representation-based Model for Document Retrieval

Document retrieval enables users to find their required documents accurately and quickly. To satisfy the requirement of retrieval efficiency, prevalent deep neural methods adopt a representation-based matching paradigm, which saves online…

Information Retrieval · Computer Science 2022-07-12 Mengxue Du , Shasha Li , Jie Yu , Jun Ma , Bin Ji , Huijun Liu , Wuhang Lin , Zibo Yi

Towards Efficient Active Learning in NLP via Pretrained Representations

Fine-tuning Large Language Models (LLMs) is now a common approach for text classification in a wide range of applications. When labeled documents are scarce, active learning helps save annotation efforts but requires retraining of massive…

Machine Learning · Computer Science 2024-02-27 Artem Vysogorets , Achintya Gopal

Learning-to-Rank with BERT in TF-Ranking

This paper describes a machine learning algorithm for document (re)ranking, in which queries and documents are firstly encoded using BERT [1], and on top of that a learning-to-rank (LTR) model constructed with TF-Ranking (TFR) [2] is…

Information Retrieval · Computer Science 2020-06-11 Shuguang Han , Xuanhui Wang , Mike Bendersky , Marc Najork

Position Prediction as an Effective Pretraining Strategy

Transformers have gained increasing popularity in a wide range of applications, including Natural Language Processing (NLP), Computer Vision and Speech Recognition, because of their powerful representational capacity. However, harnessing…

Machine Learning · Computer Science 2022-07-18 Shuangfei Zhai , Navdeep Jaitly , Jason Ramapuram , Dan Busbridge , Tatiana Likhomanenko , Joseph Yitan Cheng , Walter Talbott , Chen Huang , Hanlin Goh , Joshua Susskind

Local Self-Attention over Long Text for Efficient Document Retrieval

Neural networks, particularly Transformer-based architectures, have achieved significant performance improvements on several retrieval benchmarks. When the items being retrieved are documents, the time and memory cost of employing…

Information Retrieval · Computer Science 2020-05-12 Sebastian Hofstätter , Hamed Zamani , Bhaskar Mitra , Nick Craswell , Allan Hanbury

Long Document Ranking with Query-Directed Sparse Transformer

The computing cost of transformer self-attention often necessitates breaking long documents to fit in pretrained models in document ranking tasks. In this paper, we design Query-Directed Sparse attention that induces IR-axiomatic structures…

Artificial Intelligence · Computer Science 2020-10-27 Jyun-Yu Jiang , Chenyan Xiong , Chia-Jung Lee , Wei Wang

Interpretable & Time-Budget-Constrained Contextualization for Re-Ranking

Search engines operate under a strict time constraint as a fast response is paramount to user satisfaction. Thus, neural re-ranking models have a limited time-budget to re-rank documents. Given the same amount of time, a faster re-ranking…

Information Retrieval · Computer Science 2020-02-06 Sebastian Hofstätter , Markus Zlabinger , Allan Hanbury

Towards Better Web Search Performance: Pre-training, Fine-tuning and Learning to Rank

This paper describes the approach of the THUIR team at the WSDM Cup 2023 Pre-training for Web Search task. This task requires the participant to rank the relevant documents for each query. We propose a new data pre-processing method and…

Information Retrieval · Computer Science 2023-03-09 Haitao Li , Jia Chen , Weihang Su , Qingyao Ai , Yiqun Liu

Efficient Neural Ranking using Forward Indexes

Neural document ranking approaches, specifically transformer models, have achieved impressive gains in ranking performance. However, query processing using such over-parameterized models is both resource and time intensive. In this paper,…

Information Retrieval · Computer Science 2022-04-05 Jurek Leonhardt , Koustav Rudra , Megha Khosla , Abhijit Anand , Avishek Anand

Transformers from Compressed Representations

Compressed file formats are the corner stone of efficient data storage and transmission, yet their potential for representation learning remains largely underexplored. We introduce TEMPEST (TransformErs froM comPressed rEpreSenTations), a…

Machine Learning · Computer Science 2025-10-30 Juan C. Leon Alcazar , Mattia Soldan , Mohammad Saatialsoruji , Alejandro Pardo , Hani Itani , Juan Camilo Perez , Bernard Ghanem

BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models

Retrieval augmentation addresses many critical problems in large language models such as hallucination, staleness, and privacy leaks. However, running retrieval-augmented language models (LMs) is slow and difficult to scale due to…

Computation and Language · Computer Science 2024-05-06 Qingqing Cao , Sewon Min , Yizhong Wang , Hannaneh Hajishirzi