Related papers: Layer-wise Token Compression for Efficient Documen…

Efficient Listwise Reranking with Compressed Document Representations

Reranking, the process of refining the output from a first-stage retriever, is often considered computationally expensive, especially when using Large Language Models (LLMs). A common approach to mitigate this cost involves utilizing…

Information Retrieval · Computer Science 2026-04-30 Hervé Déjean , Stéphane Clinchant

ResRank: Unifying Retrieval and Listwise Reranking via End-to-End Joint Training with Residual Passage Compression

Large language model (LLM) based listwise reranking has emerged as the dominant paradigm for achieving state-of-the-art ranking effectiveness in information retrieval. However, its reliance on feeding full passage texts into the LLM…

Information Retrieval · Computer Science 2026-04-27 Xiaojie Ke , Shuai Zhang , Liansheng Sun , Yongjin Wang , Hengjun Jiang , Xiangkun Liu , Cunxin Gu , Jian Xu , Guanjun Jiang

Reranking with Compressed Document Representation

Reranking, the process of refining the output of a first-stage retriever, is often considered computationally expensive, especially with Large Language Models. Borrowing from recent advances in document compression for RAG, we reduce the…

Information Retrieval · Computer Science 2025-05-22 Hervé Déjean , Stéphane Clinchant

Compact Token Representations with Contextual Quantization for Efficient Document Re-ranking

Transformer based re-ranking models can achieve high search relevance through context-aware soft matching of query tokens with document tokens. To alleviate runtime complexity of such inference, previous work has adopted a late interaction…

Information Retrieval · Computer Science 2022-03-30 Yingrui Yang , Yifan Qiao , Tao Yang

Efficient Document Re-Ranking for Transformers by Precomputing Term Representations

Deep pretrained transformer networks are effective at various ranking tasks, such as question answering and ad-hoc document ranking. However, their computational expenses deem them cost-prohibitive in practice. Our proposed approach, called…

Information Retrieval · Computer Science 2020-05-27 Sean MacAvaney , Franco Maria Nardini , Raffaele Perego , Nicola Tonellotto , Nazli Goharian , Ophir Frieder

FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference

Although large language models (LLM) have achieved remarkable performance, their enormous parameter counts hinder deployment on resource-constrained hardware. Low-rank compression can reduce both memory usage and computational demand, but…

Computation and Language · Computer Science 2025-10-13 Yu-Chen Lu , Chong-Yan Chen , Chi-Chih Chang , Yu-Fang Hu , Kai-Chiang Wu

ListConRanker: A Contrastive Text Reranker with Listwise Encoding

Reranker models aim to re-rank the passages based on the semantics similarity between the given query and passages, which have recently received more attention due to the wide application of the Retrieval-Augmented Generation. Most previous…

Computation and Language · Computer Science 2025-01-14 Junlong Liu , Yue Ma , Ruihui Zhao , Junhao Zheng , Qianli Ma , Yangyang Kang

Investigating the Effects of Sparse Attention on Cross-Encoders

Cross-encoders are effective passage and document re-rankers but less efficient than other neural or classic retrieval models. A few previous studies have applied windowed self-attention to make cross-encoders more efficient. However, these…

Information Retrieval · Computer Science 2024-03-21 Ferdinand Schlatt , Maik Fröbe , Matthias Hagen

TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learning

The increasing prevalence of large language models (LLMs) such as GPT-4 in various applications has led to a surge in the size of prompts required for optimal performance, leading to challenges in computational efficiency. Prompt…

Computation and Language · Computer Science 2024-12-19 Shivam Shandilya , Menglin Xia , Supriyo Ghosh , Huiqiang Jiang , Jue Zhang , Qianhui Wu , Victor Rühle

Efficient Long-Document Reranking via Block-Level Embeddings and Top-k Interaction Refinement

Dense encoders and LLM-based rerankers struggle with long documents: single-vector representations dilute fine-grained relevance, while cross-encoders are often too expensive for practical reranking. We present an efficient long-document…

Information Retrieval · Computer Science 2026-02-06 Minghan Li , Eric Gaussier , Guodong Zhou

Efficient Token Compression for Vision Transformer with Spatial Information Preserved

Token compression is essential for reducing the computational and memory requirements of transformer models, enabling their deployment in resource-constrained environments. In this work, we propose an efficient and hardware-compatible token…

Computer Vision and Pattern Recognition · Computer Science 2025-04-01 Junzhu Mao , Yang Shen , Jinyang Guo , Yazhou Yao , Xiansheng Hua

ReTok: Replacing Tokenizer to Enhance Representation Efficiency in Large Language Model

Tokenizer is an essential component for large language models (LLMs), and a tokenizer with a high compression rate can improve the model's representation and processing efficiency. However, the tokenizer cannot ensure high compression rate…

Computation and Language · Computer Science 2024-10-08 Shuhao Gu , Mengdi Zhao , Bowen Zhang , Liangdong Wang , Jijie Li , Guang Liu

Hierarchical Token Prepending: Enhancing Information Flow in Decoder-based LLM Embeddings

Large language models produce powerful text embeddings, but their causal attention mechanism restricts the flow of information from later to earlier tokens, degrading representation quality. While recent methods attempt to solve this by…

Computation and Language · Computer Science 2025-11-20 Xueying Ding , Xingyue Huang , Mingxuan Ju , Liam Collins , Yozen Liu , Leman Akoglu , Neil Shah , Tong Zhao

Enhancing Transformer-Based Rerankers with Synthetic Data and LLM-Based Supervision

Effective document reranking is essential for improving search relevance across diverse applications. While Large Language Models (LLMs) excel at reranking due to their deep semantic understanding and reasoning, their high computational…

Computation and Language · Computer Science 2025-10-03 Dimitar Peshevski , Kiril Blazhevski , Martin Popovski , Gjorgji Madjarov

Drowning in Documents: Consequences of Scaling Reranker Inference

Rerankers, typically cross-encoders, are computationally intensive but are frequently used because they are widely assumed to outperform cheaper initial IR systems. We challenge this assumption by measuring reranker performance for full…

Information Retrieval · Computer Science 2025-07-14 Mathew Jacob , Erik Lindgren , Matei Zaharia , Michael Carbin , Omar Khattab , Andrew Drozdov

Long Document Summarization with Top-down and Bottom-up Inference

Text summarization aims to condense long documents and retain key information. Critical to the success of a summarization model is the faithful inference of latent representations of words or tokens in the source documents. Most recent…

Computation and Language · Computer Science 2022-03-16 Bo Pang , Erik Nijkamp , Wojciech Kryściński , Silvio Savarese , Yingbo Zhou , Caiming Xiong

CoRanking: Collaborative Ranking with Small and Large Ranking Agents

Large Language Models (LLMs) have demonstrated superior listwise ranking performance. However, their superior performance often relies on large-scale parameters (\eg, GPT-4) and a repetitive sliding window process, which introduces…

Computation and Language · Computer Science 2025-09-03 Wenhan Liu , Xinyu Ma , Yutao Zhu , Lixin Su , Shuaiqiang Wang , Dawei Yin , Zhicheng Dou

On the Effectiveness of Context Compression for Repository-Level Tasks: An Empirical Investigation

Repository-level code intelligence tasks require large language models (LLMs) to process long, multi-file contexts. Such inputs introduce three challenges: crucial context can be obscured by noise, truncated due to limited windows, and…

Software Engineering · Computer Science 2026-04-16 Jia Feng , Zhanyue Qin , Cuiyun Gao , Ruiqi Wang , Chaozheng Wang , Yingwei Ma , Xiaoyuan Xie

An Information-Theoretic Perspective on LLM Tokenizers

Large language model (LLM) tokenizers act as structured compressors: by mapping text to discrete token sequences, they determine token count (and thus compute and context usage) and the statistical structure seen by downstream models.…

Information Theory · Computer Science 2026-01-15 Mete Erdogan , Abhiram Gorle , Shubham Chandak , Mert Pilanci , Tsachy Weissman

Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models

Recent studies have demonstrated the effectiveness of using large language language models (LLMs) in passage ranking. The listwise approaches, such as RankGPT, have become new state-of-the-art in this task. However, the efficiency of…

Computation and Language · Computer Science 2025-01-29 Qi Liu , Bo Wang , Nan Wang , Jiaxin Mao