Related papers: PivotCompress: Compression by Sorting
This paper proposes a novel entropy encoding technique for lossless data compression. Representing a message string by its lexicographic index in the permutations of its symbols results in a compressed version matching Shannon entropy of…
We study the problem of efficient compression of a stochastic source of probability distributions. It can be viewed as a generalization of Shannon's source coding problem. It has relation to the theory of common randomness, as well as to…
We investigate lossy compression (source coding) of data in the form of permutations. This problem has direct applications in the storage of ordinal data or rankings, and in the analysis of sorting algorithms. We analyze the rate-distortion…
We study the following one-way asymmetric transmission problem, also a variant of model-based compressed sensing: a resource-limited encoder has to report a small set $S$ from a universe of $N$ items to a more powerful decoder (server). The…
Compression of integer sets and sequences has been extensively studied for settings where elements follow a uniform probability distribution. In addition, methods exist that exploit clustering of elements in order to achieve higher…
The Shannon Noiseless coding theorem (the data-compression principle) asserts that for an information source with an alphabet $\mathcal X=\{0,\ldots ,\ell -1\}$ and an asymptotic equipartition property, one can reduce the number of stored…
We show how universal codes can be used for solving some of the most important statistical problems for time series. By definition, a universal code (or a universal lossless data compressor) can compress any sequence generated by a…
Most of the world's digital data is currently encoded in a sequential form, and compression methods for sequences have been studied extensively. However, there are many types of non-sequential data for which good compression techniques are…
The traditional methods for data compression are typically based on the symbol-level statistics, with the information source modeled as a long sequence of i.i.d. random variables or a stochastic process, thus establishing the fundamental…
Most previous studies of the sorting algorithm QuickSort have used the number of key comparisons as a measure of the cost of executing the algorithm. Here we suppose that the n independent and identically distributed (i.i.d.) keys are each…
The task of compression of data -- as stated by the source coding theorem -- is one of the cornerstones of information theory. Data compression usually exploits statistical redundancies in the data according to its prior distribution.…
A compression algorithm is presented that uses the set of prime numbers. Sequences of numbers are correlated with the prime numbers, and labeled with the integers. The algorithm can be iterated on data sets, generating factors of doubles on…
Suppose there is a large file which should be transmitted (or stored) and there are several (say, m) admissible data-compressors. It seems natural to try all the compressors and then choose the best, i.e. the one that gives the shortest…
We introduce a new protocol for a lossy data compression algorithm which is based on constraint satisfaction gates. We show that the theoretical capacity of algorithms built from standard parity-check gates converges exponentially fast to…
It is well known that text compression can be achieved by predicting the next symbol in the stream of text data based on the history seen up to the current symbol. The better the prediction the more skewed the conditional probability…
We investigate how to measure and define the entropy of a simple chaotic system, three hard spheres on a ring. A novel approach is presented, which does not assume the ergodic hypothesis. It consists of transforming the particles collision…
In this paper we propose an index key compression scheme based on the notion of distinction bits by proving that the distinction bits of index keys are sufficient information to determine the sorted order of the index keys correctly. While…
Shannon Entropy is the preeminent tool for measuring the level of uncertainty (and conversely, information content) in a random variable. In the field of communications, entropy can be used to express the information content of given…
We introduce a protocol called ENCORE which simultaneously compresses and encrypts data in a one-pass process that can be implemented efficiently and possesses a number of desirable features as a streaming encoder/decoder. Motivated by the…
A new framework is introduced for examining and evaluating the fundamental limits of lossless data compression, that emphasizes genuinely non-asymptotic results. The {\em sample complexity} of compressing a given source is defined as the…