Related papers: An efficient algorithm for three-component key ind…

An Improved Algorithm for Fast K-Word Proximity Search Based on Multi-Component Key Indexes

A search query consists of several words. In a proximity full-text search, we want to find documents that contain these words near each other. This task requires much time when the query consists of high-frequently occurring words. If we…

Information Retrieval · Computer Science 2020-09-08 Alexander B. Veretennikov

Proximity Full-Text Search by Means of Additional Indexes with Multi-component Keys: In Pursuit of Optimal Performance

Full-text search engines are important tools for information retrieval. In a proximity full-text search, a document is relevant if it contains query terms near each other, especially if the query terms are frequently occurring words. For…

Information Retrieval · Computer Science 2019-07-11 Alexander B. Veretennikov

Proximity full-text searches of frequently occurring words with a response time guarantee

Full-text search engines are important tools for information retrieval. In a proximity full-text search, a document is relevant if it contains query terms near each other, especially if the query terms are frequently occurring words. For…

Information Retrieval · Computer Science 2020-09-09 Alexander B. Veretennikov

Relevance ranking for proximity full-text search based on additional indexes with multi-component keys

The problem of proximity full-text search is considered. If a search query contains high-frequently occurring words, then multi-component key indexes deliver an improvement in the search speed compared with ordinary inverted indexes. It was…

Information Retrieval · Computer Science 2021-08-03 Alexander B. Veretennikov

Selection of Optimal Parameters in the Fast K-Word Proximity Search Based on Multi-component Key Indexes

Proximity full-text search is commonly implemented in contemporary full-text search systems. Let us assume that the search query is a list of words. It is natural to consider a document as relevant if the queried words are near each other…

Information Retrieval · Computer Science 2021-01-12 Alexander B. Veretennikov

Proximity Full-Text Search with a Response Time Guarantee by Means of Additional Indexes

Full-text search engines are important tools for information retrieval. Term proximity is an important factor in relevance score measurement. In a proximity full-text search, we assume that a relevant document contains query terms near each…

Information Retrieval · Computer Science 2018-11-20 Alexander B. Veretennikov

About a structure of easily updatable full-text indexes

We consider strategies to organize easily updatable associative arrays in external memory. These arrays are used for full-text search. We study indexes with different keys: single word form, two word forms, and sequences of word forms. The…

Information Retrieval · Computer Science 2020-07-21 Alexander B. Veretennikov

Efficient Immediate-Access Dynamic Indexing

In a dynamic retrieval system, documents must be ingested as they arrive, and be immediately findable by queries. Our purpose in this paper is to describe an index structure and processing regime that accommodates that requirement for…

Information Retrieval · Computer Science 2023-01-12 Alistair Moffat , Joel Mackenzie

Document clustering with evolved multiword search queries

Text clustering holds significant value across various domains due to its ability to identify patterns and group related information. Current approaches which rely heavily on a computed similarity measure between documents are often limited…

Information Retrieval · Computer Science 2025-04-09 Laurence Hirsch , Robin Hirsch , Bayode Ogunleye

Using Additional Indexes for Fast Full-Text Search of Phrases That Contain Frequently Used Words

Searches for phrases and word sets in large text arrays by means of additional indexes are considered. Their use may reduce the query-processing time by an order of magnitude in comparison with standard inverted files.

Information Retrieval · Computer Science 2018-11-27 A. B. Veretennikov

Faster Exact Search using Document Clustering

We show how full-text search based on inverted indices can be accelerated by clustering the documents without losing results (SeCluD -- SEarch with CLUstered Documents). We develop a fast multilevel clustering algorithm that explicitly uses…

Information Retrieval · Computer Science 2014-11-06 Jonathan Dimond , Peter Sanders

Techniques for Inverted Index Compression

The data structure at the core of large-scale search engines is the inverted index, which is essentially a collection of sorted integer sequences called inverted lists. Because of the many documents indexed by such engines and stringent…

Information Retrieval · Computer Science 2022-02-08 Giulio Ermanno Pibiri , Rossano Venturini

A New Compression Based Index Structure for Efficient Information Retrieval

Finding desired information from large data set is a difficult problem. Information retrieval is concerned with the structure, analysis, organization, storage, searching, and retrieval of information. Index is the main constituent of an IR…

Information Retrieval · Computer Science 2012-09-26 Md. Abdullah al Mamun , Md. Hanif , Md. Rakib Uddin , Tanvir Ahmed , Md. Mofizul Islam

The Potential of Learned Index Structures for Index Compression

Inverted indexes are vital in providing fast key-word-based search. For every term in the document collection, a list of identifiers of documents in which the term appears is stored, along with auxiliary information such as term frequency,…

Information Retrieval · Computer Science 2019-01-30 Harrie Oosterhuis , J. Shane Culpepper , Maarten de Rijke

Anytime Ranking on Document-Ordered Indexes

Inverted indexes continue to be a mainstay of text search engines, allowing efficient querying of large document collections. While there are a number of possible organizations, document-ordered indexes are the most common, since they are…

Information Retrieval · Computer Science 2021-06-14 Joel Mackenzie , Matthias Petri , Alistair Moffat

Hybrid Inverted Index Is a Robust Accelerator for Dense Retrieval

Inverted file structure is a common technique for accelerating dense retrieval. It clusters documents based on their embeddings; during searching, it probes nearby clusters w.r.t. an input query and only evaluates documents within them by…

Information Retrieval · Computer Science 2023-10-18 Peitian Zhang , Zheng Liu , Shitao Xiao , Zhicheng Dou , Jing Yao

Remedies against the Vocabulary Gap in Information Retrieval

Search engines rely heavily on term-based approaches that represent queries and documents as bags of words. Text---a document or a query---is represented by a bag of its words that ignores grammar and word order, but retains word frequency…

Information Retrieval · Computer Science 2017-11-17 Christophe Van Gysel

Element Retrieval using Namespace Based on keyword search over XML Documents

Querying over XML elements using keyword search is steadily gaining popularity. The traditional similarity measure is widely employed in order to effectively retrieve various XML documents. A number of authors have already proposed…

Information Retrieval · Computer Science 2010-12-20 Yang Wang , Zhikui Chen , Xiaodi Huang

Representing Documents and Queries as Sets of Word Embedded Vectors for Information Retrieval

A major difficulty in applying word vector embeddings in IR is in devising an effective and efficient strategy for obtaining representations of compound units of text, such as whole documents, (in comparison to the atomic words), for the…

Information Retrieval · Computer Science 2016-06-28 Dwaipayan Roy , Debasis Ganguly , Mandar Mitra , Gareth J. F. Jones

An Analytical Approach to Document Clustering Based on Internal Criterion Function

Fast and high quality document clustering is an important task in organizing information, search engine results obtaining from user query, enhancing web crawling and information retrieval. With the large amount of data available and with a…

Information Retrieval · Computer Science 2010-03-11 Alok Ranjan , Harish Verma , Eatesh Kandpal , Joydip Dhar