Related papers: Random Permutation Codes: Lossless Source Coding o…
Most of the world's digital data is currently encoded in a sequential form, and compression methods for sequences have been studied extensively. However, there are many types of non-sequential data for which good compression techniques are…
We present an optimal method for encoding cluster assignments of arbitrary data sets. Our method, Random Cycle Coding (RCC), encodes data sequentially and sends assignment information as cycles of the permutation defined by the order of…
This paper introduces a new source coding paradigm called Sequential Massive Random Access (SMRA). In SMRA, a set of correlated sources is encoded once for all and stored on a server, and clients want to successively access to only a subset…
A new framework is introduced for examining and evaluating the fundamental limits of lossless data compression, that emphasizes genuinely non-asymptotic results. The {\em sample complexity} of compressing a given source is defined as the…
Traditionally, data compression deals with the problem of concisely representing a data source, e.g. a sequence of letters, for the purpose of eventual reproduction (either exact or approximate). In this work we are interested in the case…
Motivation: Next Generation Sequencing technologies revolutionized many fields in biology by enabling the fast and cheap sequencing of large amounts of genomic data. The ever increasing sequencing capacities enabled by current sequencing…
Second order asymptotics of fixed-length source coding and intrinsic randomness is discussed with a constant error constraint. There was a difference between optimal rates of fixed-length source coding and intrinsic randomness, which never…
Most real-world problems that machine learning algorithms are expected to solve face the situation with 1) unknown data distribution; 2) little domain-specific knowledge; and 3) datasets with limited annotation. We propose Non-Parametric…
Clustering uncertain data is an essential task in data mining for the internet of things. Possible world based algorithms seem promising for clustering uncertain data. However, there are two issues in existing possible world based…
Compression refers to encoding data using bits, so that the representation uses as few bits as possible. Compression could be lossless: i.e. encoded data can be recovered exactly from its representation) or lossy where the data is…
Many information sources are not just sequences of distinguishable symbols but rather have invariances governed by alternative counting paradigms such as permutations, combinations, and partitions. We consider an entire classification of…
A new run length encoding algorithm for lossless data compression that exploits positional redundancy by representing data in a two-dimensional model of concentric circles is presented. This visual transform enables detection of runs (each…
In this paper, we propose {\em distributed network compression via memory}. We consider two spatially separated sources with correlated unknown source parameters. We wish to study the universal compression of a sequence of length $n$ from…
This article describes lossless compression algorithms for multisets of sequences, taking advantage of the multiset's unordered structure. Multisets are a generalisation of sets where members are allowed to occur multiple times. A multiset…
In distributed systems where strong consistency is costly when not impossible, causal consistency provides a valuable abstraction to represent program executions as partial orders. In addition to the sequential program order of each…
We address the recently suggested problem of causal lossless coding of a randomly arriving source samples. We construct variable-to-fixed coding schemes and show that they outperform the previously considered fixed-to-variable schemes when…
We investigate lossy compression (source coding) of data in the form of permutations. This problem has direct applications in the storage of ordinal data or rankings, and in the analysis of sorting algorithms. We analyze the rate-distortion…
The problem of lossless fixed-rate streaming coding of discrete memoryless sources with side information at the decoder is studied. A random time-varying tree-code is used to sequentially bin strings and a Stack Algorithm with a variable…
This paper is dedicated to lossless data compression with probability estimation using neural networks. First, we propose a probability estimation architecture based on a chain of neural predictors, so that each unit of the chain is defined…
Universal fixed-to-variable lossless source coding for memoryless sources is studied in the finite blocklength and higher-order asymptotics regimes. Optimal third-order coding rates are derived for general fixed-to-variable codes and for…