English

The Remarkable Simplicity of Very High Dimensional Data: Application of Model-Based Clustering

Methodology 2011-01-11 v2 General Mathematics

Abstract

An ultrametric topology formalizes the notion of hierarchical structure. An ultrametric embedding, referred to here as ultrametricity, is implied by a hierarchical embedding. Such hierarchical structure can be global in the data set, or local. By quantifying extent or degree of ultrametricity in a data set, we show that ultrametricity becomes pervasive as dimensionality and/or spatial sparsity increases. This leads us to assert that very high dimensional data are of simple structure. We exemplify this finding through a range of simulated data cases. We discuss also application to very high frequency time series segmentation and modeling.

Keywords

Cite

@article{arxiv.0805.2756,
  title  = {The Remarkable Simplicity of Very High Dimensional Data: Application of Model-Based Clustering},
  author = {Fionn Murtagh},
  journal= {arXiv preprint arXiv:0805.2756},
  year   = {2011}
}

Comments

36 pages, 18 figures, 36 references

R2 v1 2026-06-21T10:41:53.517Z