Related papers: Techniques for Inverted Index Compression

A New Compression Based Index Structure for Efficient Information Retrieval

Finding desired information from large data set is a difficult problem. Information retrieval is concerned with the structure, analysis, organization, storage, searching, and retrieval of information. Index is the main constituent of an IR…

Information Retrieval · Computer Science 2012-09-26 Md. Abdullah al Mamun , Md. Hanif , Md. Rakib Uddin , Tanvir Ahmed , Md. Mofizul Islam

Compressing integer lists with Contextual Arithmetic Trits

Inverted indexes allow to query large databases without needing to search in the database at each query. An important line of research is to construct the most efficient inverted indexes, both in terms of compression ratio and time…

Databases · Computer Science 2025-05-06 Yann Barsamian , André Chailloux

The Potential of Learned Index Structures for Index Compression

Inverted indexes are vital in providing fast key-word-based search. For every term in the document collection, a list of identifiers of documents in which the term appears is stored, along with auxiliary information such as term frequency,…

Information Retrieval · Computer Science 2019-01-30 Harrie Oosterhuis , J. Shane Culpepper , Maarten de Rijke

Fast, Incremental Inverted Indexing in Main Memory for Web-Scale Collections

For text retrieval systems, the assumption that all data structures reside in main memory is increasingly common. In this context, we present a novel incremental inverted indexing algorithm for web-scale collections that directly constructs…

Information Retrieval · Computer Science 2013-05-06 Nima Asadi , Jimmy Lin

On the Impact of Random Index-Partitioning on Index Compression

The performance of processing search queries depends heavily on the stored index size. Accordingly, considerable research efforts have been devoted to the development of efficient compression techniques for inverted indexes. Roughly, index…

Information Retrieval · Computer Science 2011-07-29 M. Feldman , R. Lempel , O. Somekh , K. Vornovitsky

Factorization-based Lossless Compression of Inverted Indices

Many large-scale Web applications that require ranked top-k retrieval such as Web search and online advertising are implemented using inverted indices. An inverted index represents a sparse term-document matrix, where non-zero elements…

Information Retrieval · Computer Science 2015-03-19 George Beskales , Marcus Fontoura , Maxim Gurevich , Sergei Vassilvitskii , Vanja Josifovski

Hybrid Inverted Index Is a Robust Accelerator for Dense Retrieval

Inverted file structure is a common technique for accelerating dense retrieval. It clusters documents based on their embeddings; during searching, it probes nearby clusters w.r.t. an input query and only evaluates documents within them by…

Information Retrieval · Computer Science 2023-10-18 Peitian Zhang , Zheng Liu , Shitao Xiao , Zhicheng Dou , Jing Yao

Universal Indexes for Highly Repetitive Document Collections

Indexing highly repetitive collections has become a relevant problem with the emergence of large repositories of versioned documents, among other applications. These collections may reach huge sizes, but are formed mostly of documents that…

Information Retrieval · Computer Science 2016-05-25 Francisco Claude , Antonio Fariña , Miguel A. Martínez-Prieto , Gonzalo Navarro

Using Additional Indexes for Fast Full-Text Search of Phrases That Contain Frequently Used Words

Searches for phrases and word sets in large text arrays by means of additional indexes are considered. Their use may reduce the query-processing time by an order of magnitude in comparison with standard inverted files.

Information Retrieval · Computer Science 2018-11-27 A. B. Veretennikov

On Slicing Sorted Integer Sequences

Representing sorted integer sequences in small space is a central problem for large-scale retrieval systems such as Web search engines. Efficient query resolution, e.g., intersection or random access, is achieved by carefully partitioning…

Information Retrieval · Computer Science 2019-07-23 Giulio Ermanno Pibiri

Compressed Key Sort and Fast Index Reconstruction

In this paper we propose an index key compression scheme based on the notion of distinction bits by proving that the distinction bits of index keys are sufficient information to determine the sorted order of the index keys correctly. While…

Databases · Computer Science 2020-09-25 Yongsik Kwon , Cheol Ryu , Sang Kyun Cha , Arthur H. Lee , Kunsoo Park , Bongki Moon

Re-Pair Compression of Inverted Lists

Compression of inverted lists with methods that support fast intersection operations is an active research topic. Most compression schemes rely on encoding differences between consecutive positions with techniques that favor small numbers.…

Information Retrieval · Computer Science 2009-11-18 Francisco Claude , Antonio Farina , Gonzalo Navarro

A Novel Approach to Compress Centralized Text Data using Indexed Dictionary

Data compression is very important feature in terms of saving the memory space. In this proposal, an indexed dictionary based compression is used for text data, where the word's reference in dictionary is used for compression. This approach…

Other Computer Science · Computer Science 2015-12-23 Vivek Dimri , Prof. Ranjit Biswas

Inverted Semantic-Index for Image Retrieval

This paper addresses the construction of inverted index for large-scale image retrieval. The inverted index proposed by J. Sivic brings a significant acceleration by reducing distance computations with only a small fraction of the database.…

Computer Vision and Pattern Recognition · Computer Science 2022-06-28 Ying Wang

Indexing Highly Repetitive String Collections

Two decades ago, a breakthrough in indexing string collections made it possible to represent them within their compressed space while at the same time offering indexed search functionalities. As this new technology permeated through…

Data Structures and Algorithms · Computer Science 2022-11-28 Gonzalo Navarro

SIMD Compression and the Intersection of Sorted Integers

Sorted lists of integers are commonly used in inverted indexes and database systems. They are often compressed in memory. We can use the SIMD instructions available in common processors to boost the speed of integer compression schemes. Our…

Information Retrieval · Computer Science 2020-04-22 Daniel Lemire , Leonid Boytsov , Nathan Kurz

Learning a Complete Image Indexing Pipeline

To work at scale, a complete image indexing system comprises two components: An inverted file index to restrict the actual search to only a subset that should contain most of the items relevant to the query; An approximate distance…

Computer Vision and Pattern Recognition · Computer Science 2017-12-14 Himalaya Jain , Joaquin Zepeda , Patrick Pérez , Rémi Gribonval

Quasi-Succinct Indices

Compressed inverted indices in use today are based on the idea of gap compression: documents pointers are stored in increasing order, and the gaps between successive document pointers are stored using suitable codes which represent smaller…

Information Retrieval · Computer Science 2012-06-20 Sebastiano Vigna

Reordering Columns for Smaller Indexes

Column-oriented indexes-such as projection or bitmap indexes-are compressed by run-length encoding to reduce storage and increase speed. Sorting the tables improves compression. On realistic data sets, permuting the columns in the right…

Databases · Computer Science 2015-03-13 Daniel Lemire , Owen Kaser

On Optimally Partitioning Variable-Byte Codes

The ubiquitous Variable-Byte encoding is one of the fastest compressed representation for integer sequences. However, its compression ratio is usually not competitive with other more sophisticated encoders, especially when the integers to…

Information Retrieval · Computer Science 2022-02-08 Giulio Ermanno Pibiri , Rossano Venturini