Information theoretic model validation for clustering

Joachim M. Buhmann

Information theoretic model validation for clustering

Information Theory 2010-06-03 v1 Machine Learning math.IT Machine Learning

Authors: Joachim M. Buhmann

Abstract

Model selection in clustering requires (i) to specify a suitable clustering principle and (ii) to control the model order complexity by choosing an appropriate number of clusters depending on the noise level in the data. We advocate an information theoretic perspective where the uncertainty in the measurements quantizes the set of data partitionings and, thereby, induces uncertainty in the solution space of clusterings. A clustering model, which can tolerate a higher level of fluctuations in the measurements than alternative models, is considered to be superior provided that the clustering solution is equally informative. This tradeoff between \emph{informativeness} and \emph{robustness} is used as a model selection criterion. The requirement that data partitionings should generalize from one data set to an equally probable second data set gives rise to a new notion of structure induced information.

Keywords

information theory cluster analysis belief revision

Cite

@article{arxiv.1006.0375,
  title  = {Information theoretic model validation for clustering},
  author = {Joachim M. Buhmann},
  journal= {arXiv preprint arXiv:1006.0375},
  year   = {2010}
}

Comments

9 pages, 2 figures, International Symposium on Information Theory 2010 (ISIT10 E-Mo-4.2), June 13-18 in Austin, TX}

Information theoretic model validation for clustering

Abstract

Keywords

Cite

Comments

Related papers