English

A Simple and Efficient Method to Compute a Single Linkage Dendrogram

Data Structures and Algorithms 2019-11-04 v1 Machine Learning Computation Machine Learning

Abstract

We address the problem of computing a single linkage dendrogram. A possible approach is to: (i) Form an edge weighted graph GG over the data, with edge weights reflecting dissimilarities. (ii) Calculate the MST TT of GG. (iii) Break the longest edge of TT thereby splitting it into subtrees TLT_L, TRT_R. (iv) Apply the splitting process recursively to the subtrees. This approach has the attractive feature that Prim's algorithm for MST construction calculates distances as needed, and hence there is no need to ever store the inter-point distance matrix. The recursive partitioning algorithm requires us to determine the vertices (and edges) of TLT_L and TRT_R. We show how this can be done easily and efficiently using information generated by Prim's algorithm without any additional computational cost.

Keywords

Cite

@article{arxiv.1911.00223,
  title  = {A Simple and Efficient Method to Compute a Single Linkage Dendrogram},
  author = {Huanbiao Zhu and Werner Stuetzle},
  journal= {arXiv preprint arXiv:1911.00223},
  year   = {2019}
}