English
Related papers

Related papers: Techniques for Inverted Index Compression

200 papers

Finding desired information from large data set is a difficult problem. Information retrieval is concerned with the structure, analysis, organization, storage, searching, and retrieval of information. Index is the main constituent of an IR…

Information Retrieval · Computer Science 2012-09-26 Md. Abdullah al Mamun , Md. Hanif , Md. Rakib Uddin , Tanvir Ahmed , Md. Mofizul Islam

Inverted indexes allow to query large databases without needing to search in the database at each query. An important line of research is to construct the most efficient inverted indexes, both in terms of compression ratio and time…

Databases · Computer Science 2025-05-06 Yann Barsamian , André Chailloux

Inverted indexes are vital in providing fast key-word-based search. For every term in the document collection, a list of identifiers of documents in which the term appears is stored, along with auxiliary information such as term frequency,…

Information Retrieval · Computer Science 2019-01-30 Harrie Oosterhuis , J. Shane Culpepper , Maarten de Rijke

For text retrieval systems, the assumption that all data structures reside in main memory is increasingly common. In this context, we present a novel incremental inverted indexing algorithm for web-scale collections that directly constructs…

Information Retrieval · Computer Science 2013-05-06 Nima Asadi , Jimmy Lin

The performance of processing search queries depends heavily on the stored index size. Accordingly, considerable research efforts have been devoted to the development of efficient compression techniques for inverted indexes. Roughly, index…

Information Retrieval · Computer Science 2011-07-29 M. Feldman , R. Lempel , O. Somekh , K. Vornovitsky

Many large-scale Web applications that require ranked top-k retrieval such as Web search and online advertising are implemented using inverted indices. An inverted index represents a sparse term-document matrix, where non-zero elements…

Information Retrieval · Computer Science 2015-03-19 George Beskales , Marcus Fontoura , Maxim Gurevich , Sergei Vassilvitskii , Vanja Josifovski

Inverted file structure is a common technique for accelerating dense retrieval. It clusters documents based on their embeddings; during searching, it probes nearby clusters w.r.t. an input query and only evaluates documents within them by…

Information Retrieval · Computer Science 2023-10-18 Peitian Zhang , Zheng Liu , Shitao Xiao , Zhicheng Dou , Jing Yao

Indexing highly repetitive collections has become a relevant problem with the emergence of large repositories of versioned documents, among other applications. These collections may reach huge sizes, but are formed mostly of documents that…

Information Retrieval · Computer Science 2016-05-25 Francisco Claude , Antonio Fariña , Miguel A. Martínez-Prieto , Gonzalo Navarro

Searches for phrases and word sets in large text arrays by means of additional indexes are considered. Their use may reduce the query-processing time by an order of magnitude in comparison with standard inverted files.

Information Retrieval · Computer Science 2018-11-27 A. B. Veretennikov

Representing sorted integer sequences in small space is a central problem for large-scale retrieval systems such as Web search engines. Efficient query resolution, e.g., intersection or random access, is achieved by carefully partitioning…

Information Retrieval · Computer Science 2019-07-23 Giulio Ermanno Pibiri

In this paper we propose an index key compression scheme based on the notion of distinction bits by proving that the distinction bits of index keys are sufficient information to determine the sorted order of the index keys correctly. While…

Databases · Computer Science 2020-09-25 Yongsik Kwon , Cheol Ryu , Sang Kyun Cha , Arthur H. Lee , Kunsoo Park , Bongki Moon

Compression of inverted lists with methods that support fast intersection operations is an active research topic. Most compression schemes rely on encoding differences between consecutive positions with techniques that favor small numbers.…

Information Retrieval · Computer Science 2009-11-18 Francisco Claude , Antonio Farina , Gonzalo Navarro

Data compression is very important feature in terms of saving the memory space. In this proposal, an indexed dictionary based compression is used for text data, where the word's reference in dictionary is used for compression. This approach…

Other Computer Science · Computer Science 2015-12-23 Vivek Dimri , Prof. Ranjit Biswas

This paper addresses the construction of inverted index for large-scale image retrieval. The inverted index proposed by J. Sivic brings a significant acceleration by reducing distance computations with only a small fraction of the database.…

Computer Vision and Pattern Recognition · Computer Science 2022-06-28 Ying Wang

Two decades ago, a breakthrough in indexing string collections made it possible to represent them within their compressed space while at the same time offering indexed search functionalities. As this new technology permeated through…

Data Structures and Algorithms · Computer Science 2022-11-28 Gonzalo Navarro

Sorted lists of integers are commonly used in inverted indexes and database systems. They are often compressed in memory. We can use the SIMD instructions available in common processors to boost the speed of integer compression schemes. Our…

Information Retrieval · Computer Science 2020-04-22 Daniel Lemire , Leonid Boytsov , Nathan Kurz

To work at scale, a complete image indexing system comprises two components: An inverted file index to restrict the actual search to only a subset that should contain most of the items relevant to the query; An approximate distance…

Computer Vision and Pattern Recognition · Computer Science 2017-12-14 Himalaya Jain , Joaquin Zepeda , Patrick Pérez , Rémi Gribonval

Compressed inverted indices in use today are based on the idea of gap compression: documents pointers are stored in increasing order, and the gaps between successive document pointers are stored using suitable codes which represent smaller…

Information Retrieval · Computer Science 2012-06-20 Sebastiano Vigna

Column-oriented indexes-such as projection or bitmap indexes-are compressed by run-length encoding to reduce storage and increase speed. Sorting the tables improves compression. On realistic data sets, permuting the columns in the right…

Databases · Computer Science 2015-03-13 Daniel Lemire , Owen Kaser

The ubiquitous Variable-Byte encoding is one of the fastest compressed representation for integer sequences. However, its compression ratio is usually not competitive with other more sophisticated encoders, especially when the integers to…

Information Retrieval · Computer Science 2022-02-08 Giulio Ermanno Pibiri , Rossano Venturini
‹ Prev 1 2 3 10 Next ›