Related papers: Large Alphabet Source Coding using Independent Com…
This paper investigates data compression that simultaneously allows local decoding and local update. The main result is a universal compression scheme for memoryless sources with the following features. The rate can be made arbitrarily…
A general method of source coding over expansion is proposed in this paper, which enables one to reduce the problem of compressing an analog (continuous-valued source) to a set of much simpler problems, compressing discrete sources.…
This paper considers the problem of distributed source coding for a large network. A major obstacle that poses an existential threat to practical deployment of conventional approaches to distributed coding is the exponential growth of the…
Over the last few years, machine learning unlocked previously infeasible features for compression, such as providing guarantees for users' privacy or tailoring compression to specific data statistics (e.g., satellite images or audio…
Source coding is the canonical problem of data compression in information theory. In a locally encodable source coding, each compressed bit depends on only few bits of the input. In this paper, we show that a recently popular model of…
There is a class of entropy-coding methods which do not substitute symbols by code words (such as Huffman coding), but operate on intervals or ranges. This class includes three prominent members: conventional arithmetic coding, range…
Augmenting Large Language Models (LLMs) with retrieved external knowledge has proven effective for improving the factual accuracy of generated responses. Despite their success, retrieval-augmented LLMs still face the distractibility issue,…
Universal compression of patterns of sequences generated by independently identically distributed (i.i.d.) sources with unknown, possibly large, alphabets is investigated. A pattern is a sequence of indices that contains all consecutive…
The Shannon Noiseless coding theorem (the data-compression principle) asserts that for an information source with an alphabet $\mathcal X=\{0,\ldots ,\ell -1\}$ and an asymptotic equipartition property, one can reduce the number of stored…
Independent component analysis (ICA) is a statistical method for transforming an observable multi-dimensional random vector into components that are as statistically independent as possible from each other. Usually the ICA framework assumes…
Universal source coding at short blocklengths is considered for an exponential family of distributions. The \emph{Type Size} code has previously been shown to be optimal up to the third-order rate for universal compression of all memoryless…
The traditional methods for data compression are typically based on the symbol-level statistics, with the information source modeled as a long sequence of i.i.d. random variables or a stochastic process, thus establishing the fundamental…
We address the problem of constructing a fast lossless code in the case when the source alphabet is large. The main idea of the new scheme may be described as follows. We group letters with small probabilities in subsets (acting as super…
Interactive encoding and decoding based on binary low-density parity-check codes with syndrome accumulation (SA-LDPC-IED) is proposed and investigated. Assume that the source alphabet is $\mathbf{GF}(2)$, and the side information alphabet…
This paper proposes a novel entropy encoding technique for lossless data compression. Representing a message string by its lexicographic index in the permutations of its symbols results in a compressed version matching Shannon entropy of…
We show how universal codes can be used for solving some of the most important statistical problems for time series. By definition, a universal code (or a universal lossless data compressor) can compress any sequence generated by a…
In this paper, we propose a source coding scheme that represents data from unknown distributions through frequency and support information. Existing encoding schemes often compress data by sacrificing computational efficiency or by assuming…
The transmission or storage of signals typically involves data compression. The final processing step in compression systems is generally an entropy coding stage, which converts symbols into a bit stream based on their probability…
We introduce a universal quantization scheme based on random coding, and we analyze its performance. This scheme consists of a source-independent random codebook (typically_mismatched_ to the source distribution), followed by optimal…
It was recently shown that the lossless compression of a single source $X^n$ is achievable with a notion of strong locality; any $X_i$ can be decoded from a constant number of compressed bits, with a vanishing in $n$ probability of error.…