Related papers: On Slicing Sorted Integer Sequences

Techniques for Inverted Index Compression

The data structure at the core of large-scale search engines is the inverted index, which is essentially a collection of sorted integer sequences called inverted lists. Because of the many documents indexed by such engines and stringent…

Information Retrieval · Computer Science 2022-02-08 Giulio Ermanno Pibiri , Rossano Venturini

Reordering Columns for Smaller Indexes

Column-oriented indexes-such as projection or bitmap indexes-are compressed by run-length encoding to reduce storage and increase speed. Sorting the tables improves compression. On realistic data sets, permuting the columns in the right…

Databases · Computer Science 2015-03-13 Daniel Lemire , Owen Kaser

On the Impact of Random Index-Partitioning on Index Compression

The performance of processing search queries depends heavily on the stored index size. Accordingly, considerable research efforts have been devoted to the development of efficient compression techniques for inverted indexes. Roughly, index…

Information Retrieval · Computer Science 2011-07-29 M. Feldman , R. Lempel , O. Somekh , K. Vornovitsky

A New Compression Based Index Structure for Efficient Information Retrieval

Finding desired information from large data set is a difficult problem. Information retrieval is concerned with the structure, analysis, organization, storage, searching, and retrieval of information. Index is the main constituent of an IR…

Information Retrieval · Computer Science 2012-09-26 Md. Abdullah al Mamun , Md. Hanif , Md. Rakib Uddin , Tanvir Ahmed , Md. Mofizul Islam

Factorization-based Lossless Compression of Inverted Indices

Many large-scale Web applications that require ranked top-k retrieval such as Web search and online advertising are implemented using inverted indices. An inverted index represents a sparse term-document matrix, where non-zero elements…

Information Retrieval · Computer Science 2015-03-19 George Beskales , Marcus Fontoura , Maxim Gurevich , Sergei Vassilvitskii , Vanja Josifovski

On Optimally Partitioning Variable-Byte Codes

The ubiquitous Variable-Byte encoding is one of the fastest compressed representation for integer sequences. However, its compression ratio is usually not competitive with other more sophisticated encoders, especially when the integers to…

Information Retrieval · Computer Science 2022-02-08 Giulio Ermanno Pibiri , Rossano Venturini

Universal Indexes for Highly Repetitive Document Collections

Indexing highly repetitive collections has become a relevant problem with the emergence of large repositories of versioned documents, among other applications. These collections may reach huge sizes, but are formed mostly of documents that…

Information Retrieval · Computer Science 2016-05-25 Francisco Claude , Antonio Fariña , Miguel A. Martínez-Prieto , Gonzalo Navarro

Quasi-Succinct Indices

Compressed inverted indices in use today are based on the idea of gap compression: documents pointers are stored in increasing order, and the gaps between successive document pointers are stored using suitable codes which represent smaller…

Information Retrieval · Computer Science 2012-06-20 Sebastiano Vigna

Inverted Semantic-Index for Image Retrieval

This paper addresses the construction of inverted index for large-scale image retrieval. The inverted index proposed by J. Sivic brings a significant acceleration by reducing distance computations with only a small fraction of the database.…

Computer Vision and Pattern Recognition · Computer Science 2022-06-28 Ying Wang

Compressing integer lists with Contextual Arithmetic Trits

Inverted indexes allow to query large databases without needing to search in the database at each query. An important line of research is to construct the most efficient inverted indexes, both in terms of compression ratio and time…

Databases · Computer Science 2025-05-06 Yann Barsamian , André Chailloux

On Compressing Permutations and Adaptive Sorting

Previous compact representations of permutations have focused on adding a small index on top of the plain data $<\pi(1), \pi(2),...\pi(n)>$, in order to efficiently support the application of the inverse or the iterated permutation. In this…

Data Structures and Algorithms · Computer Science 2011-08-23 Jérémy Barbay , Gonzalo Navarro

Query Processing on Large Graphs: Approaches To Scalability and Response Time Trade Offs

With the advent of social networks and the web, the graph sizes have grown too large to fit in main memory precipitating the need for alternative approaches for an efficient, scalable evaluation of queries on graphs of any size. Here, we…

Databases · Computer Science 2019-05-15 Soumyava Das , Abhishek Santra , Jay Bodra , Sharma Chakravarthy

Distributed Abstraction Algorithm for Online Predicate Detection

Analyzing a distributed computation is a hard problem in general due to the combinatorial explosion in the size of the state-space with the number of processes in the system. By abstracting the computation, unnecessary explorations can be…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-06-05 Himanshu Chauhan , Vijay K. Garg , Aravind Natarajan , Neeraj Mittal

LRM-Trees: Compressed Indices, Adaptive Sorting, and Compressed Permutations

LRM-Trees are an elegant way to partition a sequence of values into sorted consecutive blocks, and to express the relative position of the first element of each block within a previous block. They were used to encode ordinal trees and to…

Data Structures and Algorithms · Computer Science 2010-09-30 Jérémy Barbay , Johannes Fischer

Link and code: Fast indexing with graphs and compact regression codes

Similarity search approaches based on graph walks have recently attained outstanding speed-accuracy trade-offs, taking aside the memory requirements. In this paper, we revisit these approaches by considering, additionally, the memory…

Computer Vision and Pattern Recognition · Computer Science 2018-06-07 Matthijs Douze , Alexandre Sablayrolles , Hervé Jégou

Segmentation of Subspaces in Sequential Data

We propose Ordered Subspace Clustering (OSC) to segment data drawn from a sequentially ordered union of subspaces. Similar to Sparse Subspace Clustering (SSC) we formulate the problem as one of finding a sparse representation but include an…

Computer Vision and Pattern Recognition · Computer Science 2015-04-17 Stephen Tierney , Yi Guo , Junbin Gao

Coconut: sortable summarizations for scalable indexes over static and streaming data series

Many modern applications produce massive streams of data series that need to be analyzed, requiring efficient similarity search operations. However, the state-of-the-art data series indexes that are used for this purpose do not scale well…

Databases · Computer Science 2021-04-19 Haridimos Kondylakis , Niv Dayan , Kostas Zoumpatianos , Themis Palpanas

Indexing Highly Repetitive String Collections

Two decades ago, a breakthrough in indexing string collections made it possible to represent them within their compressed space while at the same time offering indexed search functionalities. As this new technology permeated through…

Data Structures and Algorithms · Computer Science 2022-11-28 Gonzalo Navarro

String Indexing with Compressed Patterns

Given a string $S$ of length $n$, the classic string indexing problem is to preprocess $S$ into a compact data structure that supports efficient subsequent pattern queries. In this paper we consider the basic variant where the pattern is…

Data Structures and Algorithms · Computer Science 2024-02-15 Philip Bille , Inge Li Gørtz , Teresa Anna Steiner

Fast, Incremental Inverted Indexing in Main Memory for Web-Scale Collections

For text retrieval systems, the assumption that all data structures reside in main memory is increasingly common. In this context, we present a novel incremental inverted indexing algorithm for web-scale collections that directly constructs…

Information Retrieval · Computer Science 2013-05-06 Nima Asadi , Jimmy Lin