Related papers: On Compressing Permutations and Adaptive Sorting
We explore various techniques to compress a permutation $\pi$ over n integers, taking advantage of ordered subsequences in $\pi$, while supporting its application $\pi$(i) and the application of its inverse $\pi^{-1}(i)$ in small time. Our…
For a permutation $\pi$, and the corresponding permutation matrix, we introduce the notion of {\em discrete derivative}, obtained by taking differences of successive entries in $\pi$. We characterize the possible derivatives of…
LRM-Trees are an elegant way to partition a sequence of values into sorted consecutive blocks, and to express the relative position of the first element of each block within a previous block. They were used to encode ordinal trees and to…
Compression of integer sets and sequences has been extensively studied for settings where elements follow a uniform probability distribution. In addition, methods exist that exploit clustering of elements in order to achieve higher…
Two decades ago, a breakthrough in indexing string collections made it possible to represent them within their compressed space while at the same time offering indexed search functionalities. As this new technology permeated through…
We investigate lossy compression (source coding) of data in the form of permutations. This problem has direct applications in the storage of ordinal data or rankings, and in the analysis of sorting algorithms. We analyze the rate-distortion…
Indexing highly repetitive collections has become a relevant problem with the emergence of large repositories of versioned documents, among other applications. These collections may reach huge sizes, but are formed mostly of documents that…
Representing sorted integer sequences in small space is a central problem for large-scale retrieval systems such as Web search engines. Efficient query resolution, e.g., intersection or random access, is achieved by carefully partitioning…
There is a deep connection between permutations and trees. Certain sub-structures of permutations, called sub-permutations, bijectively map to sub-trees of binary increasing trees. This opens a powerful tool set to study enumerative and…
Column-oriented indexes-such as projection or bitmap indexes-are compressed by run-length encoding to reduce storage and increase speed. Sorting the tables improves compression. On realistic data sets, permuting the columns in the right…
A consecutive pattern in a permutation $\pi$ is another permutation $\sigma$ determined by the relative order of a subsequence of contiguous entries of $\pi$. Traditional notions such as descents, runs and peaks can be viewed as particular…
We assume the permutation $\pi$ is given by an $n$-element array in which the $i$-th element denotes the value $\pi(i)$. Constructing its inverse in-place (i.e. using $O(\log{n})$ bits of additional memory) can be achieved in linear time…
Sorted data is usually easier to compress than unsorted permutations of the same data. This motivates a simple compression scheme: specify the sorted permutation of the data along with a representation of the sorted data compressed…
Compression of inverted lists with methods that support fast intersection operations is an active research topic. Most compression schemes rely on encoding differences between consecutive positions with techniques that favor small numbers.…
Commuting operations greatly simplify consistency in distributed systems. This paper focuses on designing for commutativity, a topic neglected previously. We show that the replicas of \emph{any} data type for which concurrent operations…
We study the compressive diffusion strategies over distributed networks based on the diffusion implementation and adaptive extraction of the information from the compressed diffusion data. We demonstrate that one can achieve a comparable…
Most of the world's digital data is currently encoded in a sequential form, and compression methods for sequences have been studied extensively. However, there are many types of non-sequential data for which good compression techniques are…
Tries are popular data structures for storing a set of strings, where common prefixes are represented by common root-to-node paths. Over fifty years of usage have produced many variants and implementations to overcome some of their…
The data structure at the core of large-scale search engines is the inverted index, which is essentially a collection of sorted integer sequences called inverted lists. Because of the many documents indexed by such engines and stringent…
Counting distinct permutations with replacement, especially when involving multiple subwords, is a longstanding challenge in combinatorial analysis, with critical applications in cryptography, bioinformatics, and statistical modeling. This…