Benefiting from Disorder: Source Coding for Unordered Data
Abstract
The order of letters is not always relevant in a communication task. This paper discusses the implications of order irrelevance on source coding, presenting results in several major branches of source coding theory: lossless coding, universal lossless coding, rate-distortion, high-rate quantization, and universal lossy coding. The main conclusions demonstrate that there is a significant rate savings when order is irrelevant. In particular, lossless coding of n letters from a finite alphabet requires Theta(log n) bits and universal lossless coding requires n + o(n) bits for many countable alphabet sources. However, there are no universal schemes that can drive a strong redundancy measure to zero. Results for lossy coding include distribution-free expressions for the rate savings from order irrelevance in various high-rate quantization schemes. Rate-distortion bounds are given, and it is shown that the analogue of the Shannon lower bound is loose at all finite rates.
Keywords
Cite
@article{arxiv.0708.2310,
title = {Benefiting from Disorder: Source Coding for Unordered Data},
author = {Lav R. Varshney and Vivek K. Goyal},
journal= {arXiv preprint arXiv:0708.2310},
year = {2007}
}
Comments
35 pages