English

Distributed Information-Theoretic Clustering

Information Theory 2021-11-29 v7 Machine Learning math.IT

Abstract

We study a novel multi-terminal source coding setup motivated by the biclustering problem. Two separate encoders observe two i.i.d. sequences XnX^n and YnY^n, respectively. The goal is to find rate-limited encodings f(xn)f(x^n) and g(zn)g(z^n) that maximize the mutual information I(f(Xn);g(Yn))/nI(f(X^n); g(Y^n))/n. We discuss connections of this problem with hypothesis testing against independence, pattern recognition, and the information bottleneck method. Improving previous cardinality bounds for the inner and outer bounds allows us to thoroughly study the special case of a binary symmetric source and to quantify the gap between the inner and the outer bound in this special case. Furthermore, we investigate a multiple description (MD) extension of the Chief Operating Officer (CEO) problem with mutual information constraint. Surprisingly, this MD-CEO problem permits a tight single-letter characterization of the achievable region.

Keywords

Cite

@article{arxiv.1602.04605,
  title  = {Distributed Information-Theoretic Clustering},
  author = {Georg Pichler and Pablo Piantanida and Gerald Matz},
  journal= {arXiv preprint arXiv:1602.04605},
  year   = {2021}
}

Comments

30 pages, 4 figures, 1 table; published in Information and Inference

R2 v1 2026-06-22T12:50:13.519Z