Related papers: On Slicing Sorted Integer Sequences
The data structure at the core of large-scale search engines is the inverted index, which is essentially a collection of sorted integer sequences called inverted lists. Because of the many documents indexed by such engines and stringent…
Column-oriented indexes-such as projection or bitmap indexes-are compressed by run-length encoding to reduce storage and increase speed. Sorting the tables improves compression. On realistic data sets, permuting the columns in the right…
The performance of processing search queries depends heavily on the stored index size. Accordingly, considerable research efforts have been devoted to the development of efficient compression techniques for inverted indexes. Roughly, index…
Finding desired information from large data set is a difficult problem. Information retrieval is concerned with the structure, analysis, organization, storage, searching, and retrieval of information. Index is the main constituent of an IR…
Many large-scale Web applications that require ranked top-k retrieval such as Web search and online advertising are implemented using inverted indices. An inverted index represents a sparse term-document matrix, where non-zero elements…
The ubiquitous Variable-Byte encoding is one of the fastest compressed representation for integer sequences. However, its compression ratio is usually not competitive with other more sophisticated encoders, especially when the integers to…
Indexing highly repetitive collections has become a relevant problem with the emergence of large repositories of versioned documents, among other applications. These collections may reach huge sizes, but are formed mostly of documents that…
Compressed inverted indices in use today are based on the idea of gap compression: documents pointers are stored in increasing order, and the gaps between successive document pointers are stored using suitable codes which represent smaller…
This paper addresses the construction of inverted index for large-scale image retrieval. The inverted index proposed by J. Sivic brings a significant acceleration by reducing distance computations with only a small fraction of the database.…
Inverted indexes allow to query large databases without needing to search in the database at each query. An important line of research is to construct the most efficient inverted indexes, both in terms of compression ratio and time…
Previous compact representations of permutations have focused on adding a small index on top of the plain data $<\pi(1), \pi(2),...\pi(n)>$, in order to efficiently support the application of the inverse or the iterated permutation. In this…
With the advent of social networks and the web, the graph sizes have grown too large to fit in main memory precipitating the need for alternative approaches for an efficient, scalable evaluation of queries on graphs of any size. Here, we…
Analyzing a distributed computation is a hard problem in general due to the combinatorial explosion in the size of the state-space with the number of processes in the system. By abstracting the computation, unnecessary explorations can be…
LRM-Trees are an elegant way to partition a sequence of values into sorted consecutive blocks, and to express the relative position of the first element of each block within a previous block. They were used to encode ordinal trees and to…
Similarity search approaches based on graph walks have recently attained outstanding speed-accuracy trade-offs, taking aside the memory requirements. In this paper, we revisit these approaches by considering, additionally, the memory…
We propose Ordered Subspace Clustering (OSC) to segment data drawn from a sequentially ordered union of subspaces. Similar to Sparse Subspace Clustering (SSC) we formulate the problem as one of finding a sparse representation but include an…
Many modern applications produce massive streams of data series that need to be analyzed, requiring efficient similarity search operations. However, the state-of-the-art data series indexes that are used for this purpose do not scale well…
Two decades ago, a breakthrough in indexing string collections made it possible to represent them within their compressed space while at the same time offering indexed search functionalities. As this new technology permeated through…
Given a string $S$ of length $n$, the classic string indexing problem is to preprocess $S$ into a compact data structure that supports efficient subsequent pattern queries. In this paper we consider the basic variant where the pattern is…
For text retrieval systems, the assumption that all data structures reside in main memory is increasingly common. In this context, we present a novel incremental inverted indexing algorithm for web-scale collections that directly constructs…