Data ultrametricity and clusterability

Dan Simovici; Kaixun Hua

doi:10.1088/1742-6596/1334/1/012002

Data ultrametricity and clusterability

Machine Learning 2020-01-08 v1 Machine Learning

Authors: Dan Simovici , Kaixun Hua

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

The increasing needs of clustering massive datasets and the high cost of running clustering algorithms poses difficult problems for users. In this context it is important to determine if a data set is clusterable, that is, it may be partitioned efficiently into well-differentiated groups containing similar objects. We approach data clusterability from an ultrametric-based perspective. A novel approach to determine the ultrametricity of a dataset is proposed via a special type of matrix product, which allows us to evaluate the clusterability of the dataset. Furthermore, we show that by applying our technique to a dissimilarity space will generate the sub-dominant ultrametric of the dissimilarity.

Keywords

cluster analysis graph clustering data warehousing

Cite

@article{arxiv.1908.10833,
  title  = {Data ultrametricity and clusterability},
  author = {Dan Simovici and Kaixun Hua},
  journal= {arXiv preprint arXiv:1908.10833},
  year   = {2020}
}

Data ultrametricity and clusterability

Abstract

Keywords

Cite

Related papers