English

Efficient Centroid-Linkage Clustering

Data Structures and Algorithms 2024-06-10 v1

Abstract

We give an efficient algorithm for Centroid-Linkage Hierarchical Agglomerative Clustering (HAC), which computes a cc-approximate clustering in roughly n1+O(1/c2)n^{1+O(1/c^2)} time. We obtain our result by combining a new Centroid-Linkage HAC algorithm with a novel fully dynamic data structure for nearest neighbor search which works under adaptive updates. We also evaluate our algorithm empirically. By leveraging a state-of-the-art nearest-neighbor search library, we obtain a fast and accurate Centroid-Linkage HAC algorithm. Compared to an existing state-of-the-art exact baseline, our implementation maintains the clustering quality while delivering up to a 36×36\times speedup due to performing fewer distance comparisons.

Keywords

Cite

@article{arxiv.2406.05066,
  title  = {Efficient Centroid-Linkage Clustering},
  author = {MohammadHossein Bateni and Laxman Dhulipala and Willem Fletcher and Kishen N Gowda and D Ellis Hershkowitz and Rajesh Jayaram and Jakub Łącki},
  journal= {arXiv preprint arXiv:2406.05066},
  year   = {2024}
}