Related papers: Learning Term Discrimination

Finding Inverse Document Frequency Information in BERT

For many decades, BM25 and its variants have been the dominant document retrieval approach, where their two underlying features are Term Frequency (TF) and Inverse Document Frequency (IDF). The traditional approach, however, is being…

Information Retrieval · Computer Science 2022-02-25 Jaekeol Choi , Euna Jung , Sungjun Lim , Wonjong Rhee

Semantic-Sensitive Web Information Retrieval Model for HTML Documents

With the advent of the Internet, a new era of digital information exchange has begun. Currently, the Internet encompasses more than five billion online sites and this number is exponentially increasing every day. Fundamentally, Information…

Information Retrieval · Computer Science 2012-04-03 Youssef Bassil , Paul Semaan

Testing different Log Bases For Vector Model Weighting Technique

Information retrieval systems retrieves relevant documents based on a query submitted by the user. The documents are initially indexed and the words in the documents are assigned weights using a weighting technique called TFIDF which is the…

Information Retrieval · Computer Science 2023-07-13 Kamel Assaf

Inverse-Category-Frequency based supervised term weighting scheme for text categorization

Term weighting schemes often dominate the performance of many classifiers, such as kNN, centroid-based classifier and SVMs. The widely used term weighting scheme in text categorization, i.e., tf.idf, is originated from information retrieval…

Machine Learning · Computer Science 2012-06-07 Deqing Wang , Hui Zhang

Recurrent Neural Network Language Model Adaptation Derived Document Vector

In many natural language processing (NLP) tasks, a document is commonly modeled as a bag of words using the term frequency-inverse document frequency (TF-IDF) vector. One major shortcoming of the frequency-based TF-IDF feature vector is…

Computation and Language · Computer Science 2016-12-15 Wei Li , Brian Kan Wing Mak

Learning Passage Impacts for Inverted Indexes

Neural information retrieval systems typically use a cascading pipeline, in which a first-stage model retrieves a candidate set of documents and one or more subsequent stages re-rank this set using contextualized language models such as…

Information Retrieval · Computer Science 2021-04-27 Antonio Mallia , Omar Khattab , Nicola Tonellotto , Torsten Suel

Hybrid Inverted Index Is a Robust Accelerator for Dense Retrieval

Inverted file structure is a common technique for accelerating dense retrieval. It clusters documents based on their embeddings; during searching, it probes nearby clusters w.r.t. an input query and only evaluates documents within them by…

Information Retrieval · Computer Science 2023-10-18 Peitian Zhang , Zheng Liu , Shitao Xiao , Zhicheng Dou , Jing Yao

The Potential of Learned Index Structures for Index Compression

Inverted indexes are vital in providing fast key-word-based search. For every term in the document collection, a list of identifiers of documents in which the term appears is stored, along with auxiliary information such as term frequency,…

Information Retrieval · Computer Science 2019-01-30 Harrie Oosterhuis , J. Shane Culpepper , Maarten de Rijke

Incorporating Query Term Independence Assumption for Efficient Retrieval and Ranking using Deep Neural Networks

Classical information retrieval (IR) methods, such as query likelihood and BM25, score documents independently w.r.t. each query term, and then accumulate the scores. Assuming query term independence allows precomputing term-document scores…

Information Retrieval · Computer Science 2019-07-09 Bhaskar Mitra , Corby Rosset , David Hawking , Nick Craswell , Fernando Diaz , Emine Yilmaz

Improving Neural Ranking Models with Traditional IR Methods

Neural ranking methods based on large transformer models have recently gained significant attention in the information retrieval community, and have been adopted by major commercial solutions. Nevertheless, they are computationally…

Information Retrieval · Computer Science 2023-08-30 Anik Saha , Oktie Hassanzadeh , Alex Gittens , Jian Ni , Kavitha Srinivas , Bulent Yener

Learning To Retrieve: How to Train a Dense Retrieval Model Effectively and Efficiently

Ranking has always been one of the top concerns in information retrieval research. For decades, lexical matching signal has dominated the ad-hoc retrieval process, but it also has inherent defects, such as the vocabulary mismatch problem.…

Information Retrieval · Computer Science 2020-10-21 Jingtao Zhan , Jiaxin Mao , Yiqun Liu , Min Zhang , Shaoping Ma

Neural Models for Information Retrieval

Neural ranking models for information retrieval (IR) use shallow or deep neural networks to rank search results in response to a query. Traditional learning to rank models employ machine learning techniques over hand-crafted IR features. By…

Information Retrieval · Computer Science 2017-05-04 Bhaskar Mitra , Nick Craswell

A Novel Term Weighing Scheme Towards Efficient Crawl of Textual Databases

The Hidden Web is the vast repository of informational databases available only through search form interfaces, accessible by therein typing a set of keywords in the search forms. Typically, a Hidden Web crawler is employed to autonomously…

Information Retrieval · Computer Science 2013-11-05 Sonali Gupta , Komal Kumar Bhatia

Unsupervised Identification of Relevant Prior Cases

Document retrieval has taken its role in almost all domains of knowledge understanding, including the legal domain. Precedent refers to a court decision that is considered as authority for deciding subsequent cases involving identical or…

Information Retrieval · Computer Science 2021-07-20 Shivangi Bithel , Sumitra S Malagi

Semantic Sensitive TF-IDF to Determine Word Relevance in Documents

Keyword extraction has received an increasing attention as an important research topic which can lead to have advancements in diverse applications such as document context categorization, text indexing and document classification. In this…

Information Retrieval · Computer Science 2021-01-27 Amir Jalilifard , Vinicius F. Caridá , Alex F. Mansano , Rogers S. Cristo , Felipe Penhorate C. da Fonseca

Efficient Neural Ranking using Forward Indexes

Neural document ranking approaches, specifically transformer models, have achieved impressive gains in ranking performance. However, query processing using such over-parameterized models is both resource and time intensive. In this paper,…

Information Retrieval · Computer Science 2022-04-05 Jurek Leonhardt , Koustav Rudra , Megha Khosla , Abhijit Anand , Avishek Anand

(De)-Indexing and the Right to be Forgotten

In the digital age, the challenge of forgetfulness has emerged as a significant concern, particularly regarding the management of personal data and its accessibility online. The right to be forgotten (RTBF) allows individuals to request the…

Computers and Society · Computer Science 2025-01-08 Salvatore Vilella , Giancarlo Ruffo

Learning to Weight for Text Classification

In information retrieval (IR) and related tasks, term weighting approaches typically consider the frequency of the term in the document and in the collection in order to compute a score reflecting the importance of the term for the…

Machine Learning · Computer Science 2021-09-22 Alejandro Moreo Fernández , Andrea Esuli , Fabrizio Sebastiani

Document Classification using File Names

Rapid document classification is critical in several time-sensitive applications like digital forensics and large-scale media classification. Traditional approaches that rely on heavy-duty deep learning models fall short due to high…

Computation and Language · Computer Science 2025-03-07 Zhijian Li , Stefan Larson , Kevin Leach

Bottleneck-Minimal Indexing for Generative Document Retrieval

We apply an information-theoretic perspective to reconsider generative document retrieval (GDR), in which a document $x \in X$ is indexed by $t \in T$, and a neural autoregressive model is trained to map queries $Q$ to $T$. GDR can be…

Information Retrieval · Computer Science 2024-05-22 Xin Du , Lixin Xiu , Kumiko Tanaka-Ishii