Related papers: Integer Set Compression and Statistical Modeling

Compressing combinatorial objects

Most of the world's digital data is currently encoded in a sequential form, and compression methods for sequences have been studied extensively. However, there are many types of non-sequential data for which good compression techniques are…

Information Theory · Computer Science 2016-01-15 Christian Steinruecken

Data Compression with Prime Numbers

A compression algorithm is presented that uses the set of prime numbers. Sequences of numbers are correlated with the prime numbers, and labeled with the integers. The algorithm can be iterated on data sets, generating factors of doubles on…

General Physics · Physics 2007-05-23 Gordon Chalmers

Compressing Sets and Multisets of Sequences

This article describes lossless compression algorithms for multisets of sequences, taking advantage of the multiset's unordered structure. Multisets are a generalisation of sets where members are allowed to occur multiple times. A multiset…

Information Theory · Computer Science 2014-01-27 Christian Steinruecken

Lossless (and Lossy) Compression of Random Forests

Ensemble methods are among the state-of-the-art predictive modeling approaches. Applied to modern big data, these methods often require a large number of sub-learners, where the complexity of each learner typically grows with the size of…

Machine Learning · Computer Science 2018-10-29 Amichai Painsky , Saharon Rosset

Compressed Representations of Permutations, and Applications

We explore various techniques to compress a permutation $\pi$ over n integers, taking advantage of ordered subsequences in $\pi$, while supporting its application $\pi$(i) and the application of its inverse $\pi^{-1}(i)$ in small time. Our…

Data Structures and Algorithms · Computer Science 2009-02-09 Jérémy Barbay , Gonzalo Navarro

Statistical Mechanical Approach to Error Exponents of Lossy Data Compression

We present herein a scheme by which to accurately evaluate the error exponents of a lossy data compression problem, which characterize average probabilities over a code ensemble of compression failure and success above or below a critical…

Statistical Mechanics · Physics 2007-05-23 Tadaaki Hosaka , Yoshiyuki Kabashima

Guessing Revisited: A Large Deviations Approach

The problem of guessing a random string is revisited. A close relation between guessing and compression is first established. Then it is shown that if the sequence of distributions of the information spectrum satisfies the large deviation…

Information Theory · Computer Science 2010-08-12 Manjesh Kumar Hanawal , Rajesh Sundaresan

On Compressing Permutations and Adaptive Sorting

Previous compact representations of permutations have focused on adding a small index on top of the plain data $<\pi(1), \pi(2),...\pi(n)>$, in order to efficiently support the application of the inverse or the iterated permutation. In this…

Data Structures and Algorithms · Computer Science 2011-08-23 Jérémy Barbay , Gonzalo Navarro

Statistical distribution, host for encrypted information

The statistical distribution, when determined from an incomplete set of constraints, is shown to be suitable as host for encrypted information. We design an encoding/decoding scheme to embed such a distribution with hidden information. The…

Statistical Mechanics · Physics 2015-06-25 L. Rebollo-Neira , A Plastino

Compression of enumerations and gain

We study the compressibility of enumerations in the context of Kolmogorov complexity, focusing on strong and weak forms of compression and their gain: the amount of auxiliary information embedded in the compressed enumeration. The existence…

Computation and Language · Computer Science 2025-06-18 George Barmpalias , Xiaoyan Zhang , Bohua Zhan

PivotCompress: Compression by Sorting

Sorted data is usually easier to compress than unsorted permutations of the same data. This motivates a simple compression scheme: specify the sorted permutation of the data along with a representation of the sorted data compressed…

Data Structures and Algorithms · Computer Science 2014-11-24 Oscar Stiffelman

Encoding Arguments

Many proofs in discrete mathematics and theoretical computer science are based on the probabilistic method. To prove the existence of a good object, we pick a random object and show that it is bad with low probability. This method is…

Information Theory · Computer Science 2017-08-01 Pat Morin , Wolfgang Mulzer , Tommy Reddad

Model Compression for Dynamic Forecast Combination

The predictive advantage of combining several different predictive models is widely accepted. Particularly in time series forecasting problems, this combination is often dynamic to cope with potential non-stationary sources of variation…

Machine Learning · Statistics 2021-04-06 Vitor Cerqueira , Luis Torgo , Carlos Soares , Albert Bifet

Re-Pair Compression of Inverted Lists

Compression of inverted lists with methods that support fast intersection operations is an active research topic. Most compression schemes rely on encoding differences between consecutive positions with techniques that favor small numbers.…

Information Retrieval · Computer Science 2009-11-18 Francisco Claude , Antonio Farina , Gonzalo Navarro

Randomness Testing of Compressed Data

Random Number Generators play a critical role in a number of important applications. In practice, statistical testing is employed to gather evidence that a generator indeed produces numbers that appear to be random. In this paper, we…

Computational Complexity · Computer Science 2010-03-25 Weiling Chang , Binxing Fang , Xiaochun Yun , Shupeng Wang , Xiangzhan Yu

Large Alphabet Compression and Predictive Distributions through Poissonization and Tilting

This paper introduces a convenient strategy for coding and predicting sequences of independent, identically distributed random variables generated from a large alphabet of size $m$. In particular, the size of the sample is allowed to be…

Information Theory · Computer Science 2014-01-17 Xiao Yang , Andrew R. Barron

Enumerating Segmented Patterns in Compositions and Encoding by Restricted Permutations

A composition of a nonnegative integer (n) is a sequence of positive integers whose sum is (n). A composition is palindromic if it is unchanged when its terms are read in reverse order. We provide a generating function for the number of…

Combinatorics · Mathematics 2007-05-23 Sergey Kitaev , Tyrrell B. McAllister , T. Kyle Petersen

Compression in the Space of Permutations

We investigate lossy compression (source coding) of data in the form of permutations. This problem has direct applications in the storage of ordinal data or rankings, and in the analysis of sorting algorithms. We analyze the rate-distortion…

Information Theory · Computer Science 2016-11-18 Da Wang , Arya Mazumdar , Gregory Wornell

Compression, Generalization and Learning

A compression function is a map that slims down an observational set into a subset of reduced size, while preserving its informational content. In multiple applications, the condition that one new observation makes the compressed set change…

Machine Learning · Computer Science 2024-01-09 Marco C. Campi , Simone Garatti

Prefix Codes for Power Laws with Countable Support

In prefix coding over an infinite alphabet, methods that consider specific distributions generally consider those that decline more quickly than a power law (e.g., Golomb coding). Particular power-law distributions, however, model many…

Information Theory · Computer Science 2009-03-06 Michael B. Baer