English

Distributed Lance-William Clustering Algorithm

Distributed, Parallel, and Cluster Computing 2017-09-21 v1

Abstract

One important tool is the optimal clustering of data into useful categories. Dividing similar objects into a smaller number of clusters is of importance in many applications. These include search engines, monitoring of academic performance, biology and wireless networks. We first discuss a number of clustering methods. We present a parallel algorithm for the efficient clustering of objects into groups based on their similarity to each other. The input consists of an n by n distance matrix. This matrix would have a distance ranking for each pair of objects. The smaller the number, the more similar the two objects are to each other. We utilize parallel processors to calculate a hierarchal cluster of these n items based on this matrix. Another advantage of our method is distribution of the large n by n matrix. We have implemented our algorithm and have found it to be scalable both in terms of processing speed and storage.

Keywords

Cite

@article{arxiv.1709.06816,
  title  = {Distributed Lance-William Clustering Algorithm},
  author = {Gavriel Yarmish and Philip Listowsky and Simon Dexter},
  journal= {arXiv preprint arXiv:1709.06816},
  year   = {2017}
}