English

High Dimensional Cluster Analysis Using Path Lengths

Data Analysis, Statistics and Probability 2017-10-16 v1 Data Structures and Algorithms

Abstract

A hierarchical scheme for clustering data is presented which applies to spaces with a high number of dimension (ND>3N_{_{D}}>3). The data set is first reduced to a smaller set of partitions (multi-dimensional bins). Multiple clustering techniques are used, including spectral clustering, however, new techniques are also introduced based on the path length between partitions that are connected to one another. A Line-Of-Sight algorithm is also developed for clustering. A test bank of 12 data sets with varying properties is used to expose the strengths and weaknesses of each technique. Finally, a robust clustering technique is discussed based on reaching a consensus among the multiple approaches, overcoming the weaknesses found individually.

Keywords

Cite

@article{arxiv.1710.04886,
  title  = {High Dimensional Cluster Analysis Using Path Lengths},
  author = {Kevin McIlhany and Stephen Wiggins},
  journal= {arXiv preprint arXiv:1710.04886},
  year   = {2017}
}

Comments

52 pages, 94 figures