Related papers: Topic-Grained Text Representation-based Model for …

Enhancing Knowledge Retrieval with In-Context Learning and Semantic Search through Generative AI

Retrieving and extracting knowledge from extensive research documents and large databases presents significant challenges for researchers, students, and professionals in today's information-rich era. Existing retrieval systems, which rely…

Information Retrieval · Computer Science 2025-02-06 Mohammed-Khalil Ghali , Abdelrahman Farrag , Daehan Won , Yu Jin

Efficient Document Re-Ranking for Transformers by Precomputing Term Representations

Deep pretrained transformer networks are effective at various ranking tasks, such as question answering and ad-hoc document ranking. However, their computational expenses deem them cost-prohibitive in practice. Our proposed approach, called…

Information Retrieval · Computer Science 2020-05-27 Sean MacAvaney , Franco Maria Nardini , Raffaele Perego , Nicola Tonellotto , Nazli Goharian , Ophir Frieder

Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations

Topic models have been the prominent tools for automatic topic discovery from text corpora. Despite their effectiveness, topic models suffer from several limitations including the inability of modeling word ordering information in…

Computation and Language · Computer Science 2022-02-10 Yu Meng , Yunyi Zhang , Jiaxin Huang , Yu Zhang , Jiawei Han

Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

Ad-hoc search calls for the selection of appropriate answers from a massive-scale corpus. Nowadays, the embedding-based retrieval (EBR) becomes a promising solution, where deep learning based document representation and ANN search…

Information Retrieval · Computer Science 2022-03-03 Shitao Xiao , Zheng Liu , Weihao Han , Jianjin Zhang , Yingxia Shao , Defu Lian , Chaozhuo Li , Hao Sun , Denvy Deng , Liangjie Zhang , Qi Zhang , Xing Xie

Text Embeddings for Retrieval From a Large Knowledge Base

Text embedding representing natural language documents in a semantic vector space can be used for document retrieval using nearest neighbor lookup. In order to study the feasibility of neural models specialized for retrieval in a…

Information Retrieval · Computer Science 2019-05-03 Tolgahan Cakaloglu , Christian Szegedy , Xiaowei Xu

Generative Dense Retrieval: Memory Can Be a Burden

Generative Retrieval (GR), autoregressively decoding relevant document identifiers given a query, has been shown to perform well under the setting of small-scale corpora. By memorizing the document corpus with model parameters, GR…

Information Retrieval · Computer Science 2024-01-22 Peiwen Yuan , Xinglin Wang , Shaoxiong Feng , Boyuan Pan , Yiwei Li , Heda Wang , Xupeng Miao , Kan Li

Fine-Grained Table Retrieval Through the Lens of Complex Queries

Enabling question answering over tables and databases in natural language has become a key capability in the democratization of insights from tabular data sources. These systems first require retrieval of data that is relevant to a given…

Information Retrieval · Computer Science 2026-03-10 Wojciech Kosiuk , Xingyu Ji , Yeounoh Chung , Fatma Özcan , Madelon Hulsebos

Improving Retrieval in Theme-specific Applications using a Corpus Topical Taxonomy

Document retrieval has greatly benefited from the advancements of large-scale pre-trained language models (PLMs). However, their effectiveness is often limited in theme-specific applications for specialized areas or industries, due to…

Information Retrieval · Computer Science 2024-03-08 SeongKu Kang , Shivam Agarwal , Bowen Jin , Dongha Lee , Hwanjo Yu , Jiawei Han

Remedies against the Vocabulary Gap in Information Retrieval

Search engines rely heavily on term-based approaches that represent queries and documents as bags of words. Text---a document or a query---is represented by a bag of its words that ignores grammar and word order, but retains word frequency…

Information Retrieval · Computer Science 2017-11-17 Christophe Van Gysel

Beyond Lexical: A Semantic Retrieval Framework for Textual SearchEngine

Search engine has become a fundamental component in various web and mobile applications. Retrieving relevant documents from the massive datasets is challenging for a search engine system, especially when faced with verbose or tail queries.…

Information Retrieval · Computer Science 2020-08-11 Kuan Fang , Long Zhao , Zhan Shen , RuiXing Wang , RiKang Zhour , LiWen Fan

Efficient Conversational Search via Topical Locality in Dense Retrieval

Pre-trained language models have been widely exploited to learn dense representations of documents and queries for information retrieval. While previous efforts have primarily focused on improving effectiveness and user satisfaction,…

Information Retrieval · Computer Science 2025-05-01 Cristina Ioana Muntean , Franco Maria Nardini , Raffaele Perego , Guido Rocchietti , Cosimo Rulli

SDR: Efficient Neural Re-ranking using Succinct Document Representation

BERT based ranking models have achieved superior performance on various information retrieval tasks. However, the large number of parameters and complex self-attention operation come at a significant latency overhead. To remedy this, recent…

Information Retrieval · Computer Science 2021-10-06 Nachshon Cohen , Amit Portnoy , Besnik Fetahu , Amir Ingber

Generative Topic Embedding: a Continuous Representation of Documents (Extended Version with Proofs)

Word embedding maps words into a low-dimensional continuous embedding space by exploiting the local word collocation patterns in a small context window. On the other hand, topic modeling maps documents onto a low-dimensional topic space, by…

Computation and Language · Computer Science 2016-08-09 Shaohua Li , Tat-Seng Chua , Jun Zhu , Chunyan Miao

Retrieving Complex Tables with Multi-Granular Graph Representation Learning

The task of natural language table retrieval (NLTR) seeks to retrieve semantically relevant tables based on natural language queries. Existing learning systems for this task often treat tables as plain text based on the assumption that…

Information Retrieval · Computer Science 2021-05-06 Fei Wang , Kexuan Sun , Muhao Chen , Jay Pujara , Pedro Szekely

A Modern Perspective on Query Likelihood with Deep Generative Retrieval Models

Existing neural ranking models follow the text matching paradigm, where document-to-query relevance is estimated through predicting the matching score. Drawing from the rich literature of classical generative retrieval models, we introduce…

Information Retrieval · Computer Science 2021-06-28 Oleg Lesota , Navid Rekabsaz , Daniel Cohen , Klaus Antonius Grasserbauer , Carsten Eickhoff , Markus Schedl

Improving Document Representations by Generating Pseudo Query Embeddings for Dense Retrieval

Recently, the retrieval models based on dense representations have been gradually applied in the first stage of the document retrieval tasks, showing better performance than traditional sparse vector space models. To obtain high efficiency,…

Information Retrieval · Computer Science 2021-08-20 Hongyin Tang , Xingwu Sun , Beihong Jin , Jingang Wang , Fuzheng Zhang , Wei Wu

Fine-Grained Distillation for Long Document Retrieval

Long document retrieval aims to fetch query-relevant documents from a large-scale collection, where knowledge distillation has become de facto to improve a retriever by mimicking a heterogeneous yet powerful cross-encoder. However, in…

Information Retrieval · Computer Science 2022-12-21 Yucheng Zhou , Tao Shen , Xiubo Geng , Chongyang Tao , Guodong Long , Can Xu , Daxin Jiang

DSRIM: A Deep Neural Information Retrieval Model Enhanced by a Knowledge Resource Driven Representation of Documents

The state-of-the-art solutions to the vocabulary mismatch in information retrieval (IR) mainly aim at leveraging either the relational semantics provided by external resources or the distributional semantics, recently investigated by deep…

Information Retrieval · Computer Science 2017-07-28 Gia-Hung Nguyen , Laure Soulier , Lynda Tamine , Nathalie Bricon-Souf

Learning to Match Using Local and Distributed Representations of Text for Web Search

Models such as latent semantic analysis and those based on neural embeddings learn distributed representations of text, and match the query against the document in the latent semantic space. In traditional information retrieval models, on…

Information Retrieval · Computer Science 2016-10-27 Bhaskar Mitra , Fernando Diaz , Nick Craswell

TPRM: A Topic-based Personalized Ranking Model for Web Search

Ranking models have achieved promising results, but it remains challenging to design personalized ranking systems to leverage user profiles and semantic representations between queries and documents. In this paper, we propose a topic-based…

Information Retrieval · Computer Science 2021-08-16 Minghui Huang , Wei Peng , Dong Wang