Related papers: DREQ: Document Re-Ranking Using Entity-based Query…
Neural IR has advanced through two distinct paths: entity-oriented approaches leveraging knowledge graphs and multi-vector models capturing fine-grained semantics. We introduce QDER, a neural re-ranking model that unifies these approaches…
Current neural re-rankers often struggle with complex information needs and long, content-rich documents. The fundamental issue is not computational--it is intelligent content selection: identifying what matters in lengthy, multi-faceted…
This paper presents a word-entity duet framework for utilizing knowledge bases in ad-hoc retrieval. In this work, the query and documents are modeled by word-based representations and entity-based representations. Ranking features are…
The similarity between the question and indexed documents is a crucial factor in document retrieval for retrieval-augmented question answering. Although this is typically the only method for obtaining the relevant documents, it is not the…
Dense encoders and LLM-based rerankers struggle with long documents: single-vector representations dilute fine-grained relevance, while cross-encoders are often too expensive for practical reranking. We present an efficient long-document…
News articles usually contain knowledge entities such as celebrities or organizations. Important entities in articles carry key messages and help to understand the content in a more direct way. An industrial news recommender system contains…
Dense Retrieval (DR) models have proven to be effective for Document Retrieval and Information Grounding tasks. Usually, these models are trained and optimized for improving the relevance of top-ranked documents for a given query. Previous…
We explore several new models for document relevance ranking, building upon the Deep Relevance Matching Model (DRMM) of Guo et al. (2016). Unlike DRMM, which uses context-insensitive encodings of terms and query-document term interactions,…
Document-level relation extraction (RE), which requires reasoning on multiple entities in different sentences to identify complex inter-sentence relations, is more challenging than sentence-level RE. To extract the complex inter-sentence…
The goal of screening prioritisation in systematic reviews is to identify relevant documents with high recall and rank them in early positions for review. This saves reviewing effort if paired with a stopping criterion, and speeds up review…
What if Information Retrieval (IR) systems did not just retrieve relevant information that is stored in their indices, but could also "understand" it and synthesise it into a single document? We present a preliminary study that makes a…
Relation Extraction (RE) is a fundamental task of information extraction, which has attracted a large amount of research attention. Previous studies focus on extracting the relations within a sentence or document, while currently…
Accurate document retrieval is crucial for the success of retrieval-augmented generation (RAG) applications, including open-domain question answering and code completion. While large language models (LLMs) have been employed as dense…
Entity-oriented retrieval assumes that relevant documents exhibit query-relevant entities, yet evaluations report conflicting results. We show this inconsistency stems not from model failure, but from evaluation. On TREC Robust04, we…
Named Entity Recognition (NER) aims to extract and classify entity mentions in the text into pre-defined types (e.g., organization or person name). Recently, many works have been proposed to shape the NER as a machine reading comprehension…
Joint entity and relation extraction plays a pivotal role in various applications, notably in the construction of knowledge graphs. Despite recent progress, existing approaches often fall short in two key aspects: richness of representation…
Referring Expression Comprehension (REC) aims to localize specified entities or regions in an image based on natural language descriptions. While existing methods handle single-entity localization, they often ignore complex inter-entity…
Long document re-ranking has been a challenging problem for neural re-rankers based on deep language models like BERT. Early work breaks the documents into short passage-like chunks. These chunks are independently mapped to scalar scores or…
Large Language Models (LLMs) have shown strong capabilities in document re-ranking, a key component in modern Information Retrieval (IR) systems. However, existing LLM-based approaches face notable limitations, including ranking…
Re-ranking systems aim to reorder an initial list of documents to satisfy better the information needs associated with a user-provided query. Modern re-rankers predominantly rely on neural network models, which have proven highly effective…