English

Benefiting from Disorder: Source Coding for Unordered Data

Information Theory 2007-08-20 v1 math.IT

Abstract

The order of letters is not always relevant in a communication task. This paper discusses the implications of order irrelevance on source coding, presenting results in several major branches of source coding theory: lossless coding, universal lossless coding, rate-distortion, high-rate quantization, and universal lossy coding. The main conclusions demonstrate that there is a significant rate savings when order is irrelevant. In particular, lossless coding of n letters from a finite alphabet requires Theta(log n) bits and universal lossless coding requires n + o(n) bits for many countable alphabet sources. However, there are no universal schemes that can drive a strong redundancy measure to zero. Results for lossy coding include distribution-free expressions for the rate savings from order irrelevance in various high-rate quantization schemes. Rate-distortion bounds are given, and it is shown that the analogue of the Shannon lower bound is loose at all finite rates.

Keywords

Cite

@article{arxiv.0708.2310,
  title  = {Benefiting from Disorder: Source Coding for Unordered Data},
  author = {Lav R. Varshney and Vivek K. Goyal},
  journal= {arXiv preprint arXiv:0708.2310},
  year   = {2007}
}

Comments

35 pages

R2 v1 2026-06-21T09:08:12.874Z