Related papers: Efficient Centroid-Linkage Clustering

Parallel Hierarchical Agglomerative Clustering in Low Dimensions

Hierarchical Agglomerative Clustering (HAC) is an extensively studied and widely used method for hierarchical clustering in $\mathbb{R}^k$ based on repeatedly merging the closest pair of clusters according to an input linkage function $d$.…

Data Structures and Algorithms · Computer Science 2025-07-29 MohammadHossein Bateni , Laxman Dhulipala , Willem Fletcher , Kishen N Gowda , D Ellis Hershkowitz , Rajesh Jayaram , Jakub Łącki

Hierarchical Agglomerative Graph Clustering in Nearly-Linear Time

We study the widely used hierarchical agglomerative clustering (HAC) algorithm on edge-weighted graphs. We define an algorithmic framework for hierarchical agglomerative graph clustering that provides the first efficient $\tilde{O}(m)$ time…

Data Structures and Algorithms · Computer Science 2021-06-11 Laxman Dhulipala , David Eisenstat , Jakub Łącki , Vahab Mirrokni , Jessica Shi

It's Hard to HAC with Average Linkage!

Average linkage Hierarchical Agglomerative Clustering (HAC) is an extensively studied and applied method for hierarchical clustering. Recent applications to massive datasets have driven significant interest in near-linear-time and efficient…

Data Structures and Algorithms · Computer Science 2025-02-06 MohammadHossein Bateni , Laxman Dhulipala , Kishen N Gowda , D Ellis Hershkowitz , Rajesh Jayaram , Jakub Łącki

Scaling Hierarchical Agglomerative Clustering to Billion-sized Datasets

Hierarchical Agglomerative Clustering (HAC) is one of the oldest but still most widely used clustering methods. However, HAC is notoriously hard to scale to large data sets as the underlying complexity is at least quadratic in the number of…

Machine Learning · Computer Science 2021-05-26 Baris Sumengen , Anand Rajagopalan , Gui Citovsky , David Simcha , Olivier Bachem , Pradipta Mitra , Sam Blasiak , Mason Liang , Sanjiv Kumar

Hierarchical Agglomerative Graph Clustering in Poly-Logarithmic Depth

Obtaining scalable algorithms for hierarchical agglomerative clustering (HAC) is of significant interest due to the massive size of real-world datasets. At the same time, efficiently parallelizing HAC is difficult due to the seemingly…

Data Structures and Algorithms · Computer Science 2022-06-24 Laxman Dhulipala , David Eisenstat , Jakub Łącki , Vahab Mirronki , Jessica Shi

TeraHAC: Hierarchical Agglomerative Clustering of Trillion-Edge Graphs

We introduce TeraHAC, a $(1+\epsilon)$-approximate hierarchical agglomerative clustering (HAC) algorithm which scales to trillion-edge graphs. Our algorithm is based on a new approach to computing $(1+\epsilon)$-approximate HAC, which is a…

Data Structures and Algorithms · Computer Science 2024-06-12 Laxman Dhulipala , Jason Lee , Jakub Łącki , Vahab Mirrokni

Chamfer-Linkage for Hierarchical Agglomerative Clustering

Hierarchical Agglomerative Clustering (HAC) is a widely-used clustering method based on repeatedly merging the closest pair of clusters, where inter-cluster distances are determined by a linkage function. Unlike many clustering methods, HAC…

Machine Learning · Computer Science 2026-02-12 Kishen N Gowda , Willem Fletcher , MohammadHossein Bateni , Laxman Dhulipala , D Ellis Hershkowitz , Rajesh Jayaram , Jakub Łącki

ParChain: A Framework for Parallel Hierarchical Agglomerative Clustering using Nearest-Neighbor Chain

This paper studies the hierarchical clustering problem, where the goal is to produce a dendrogram that represents clusters at varying scales of a data set. We propose the ParChain framework for designing parallel hierarchical agglomerative…

Data Structures and Algorithms · Computer Science 2022-02-15 Shangdi Yu , Yiqiu Wang , Yan Gu , Laxman Dhulipala , Julian Shun

Fair Algorithms for Hierarchical Agglomerative Clustering

Hierarchical Agglomerative Clustering (HAC) algorithms are extensively utilized in modern data science, and seek to partition the dataset into clusters while generating a hierarchical relationship between the data samples. HAC algorithms…

Machine Learning · Computer Science 2023-08-01 Anshuman Chhabra , Prasant Mohapatra

DynHAC: Fully Dynamic Approximate Hierarchical Agglomerative Clustering

We consider the problem of maintaining a hierarchical agglomerative clustering (HAC) in the dynamic setting, when the input is subject to point insertions and deletions. We introduce DynHAC - the first dynamic HAC algorithm for the popular…

Data Structures and Algorithms · Computer Science 2025-01-15 Shangdi Yu , Laxman Dhulipala , Jakub Łącki , Nikos Parotsidis

Agglomerative Likelihood Clustering

We consider the problem of fast time-series data clustering. Building on previous work modeling the correlation-based Hamiltonian of spin variables we present an updated fast non-expensive Agglomerative Likelihood Clustering algorithm…

Computational Finance · Quantitative Finance 2022-03-22 Lionel Yelibi , Tim Gebbie

Data Aggregation for Hierarchical Clustering

Hierarchical Agglomerative Clustering (HAC) is likely the earliest and most flexible clustering method, because it can be used with many distances, similarities, and various linkage strategies. It is often used when the number of clusters…

Machine Learning · Statistics 2023-09-07 Erich Schubert , Andreas Lang

An Automatic Clustering Technique for Optimal Clusters

This paper proposes a simple, automatic and efficient clustering algorithm, namely, Automatic Merging for Optimal Clusters (AMOC) which aims to generate nearly optimal clusters for the given datasets automatically. The AMOC is an extension…

Computer Vision and Pattern Recognition · Computer Science 2011-09-07 K. Karteeka Pavan , Allam Appa Rao , A. V. Dattatreya Rao

On Randomly Projected Hierarchical Clustering with Guarantees

Hierarchical clustering (HC) algorithms are generally limited to small data instances due to their runtime costs. Here we mitigate this shortcoming and explore fast HC algorithms based on random projections for single (SLC) and average…

Information Retrieval · Computer Science 2014-01-24 Johannes Schneider , Michail Vlachos

Scalable Exact Hierarchical Agglomerative Clustering via Sparse Geographic Distance Graphs

Exact hierarchical agglomerative clustering (HAC) of large spatial datasets is limited in practice by the $\mathcal{O}(n^2)$ time and memory required for the full pairwise distance matrix. We present GSHAC (Geographically Sparse…

Data Structures and Algorithms · Computer Science 2026-04-14 Victor Maus , Vinicius Pozzobon Borin

A Fast Synchronization Clustering Algorithm

This paper presents a Fast Synchronization Clustering algorithm (FSynC), which is an improved version of SynC algorithm. In order to decrease the time complexity of the original SynC algorithm, we combine grid cell partitioning method and…

Machine Learning · Computer Science 2014-07-29 Xinquan Chen

Exact clustering in linear time

The time complexity of data clustering has been viewed as fundamentally quadratic, slowing with the number of data items, as each item is compared for similarity to preceding items. Clustering of large data sets has been infeasible without…

Data Structures and Algorithms · Computer Science 2017-02-28 Jonathan A. Marshall , Lawrence C. Rafsky

Hierarchical Clustering better than Average-Linkage

Hierarchical Clustering (HC) is a widely studied problem in exploratory data analysis, usually tackled by simple agglomerative procedures like average-linkage, single-linkage or complete-linkage. In this paper we focus on two objectives,…

Data Structures and Algorithms · Computer Science 2018-08-08 Moses Charikar , Vaggos Chatziafratis , Rad Niazadeh

Local Search for Clustering in Almost-linear Time

We propose the first \emph{local search} algorithm for Euclidean clustering that attains an $O(1)$-approximation in almost-linear time. Specifically, for Euclidean $k$-Means, our algorithm achieves an $O(c)$-approximation in $\tilde{O}(n^{1…

Data Structures and Algorithms · Computer Science 2025-04-07 Shaofeng H. -C. Jiang , Yaonan Jin , Jianing Lou , Pinyan Lu

Adjacency-constrained hierarchical clustering of a band similarity matrix with application to Genomics

Motivation: Genomic data analyses such as Genome-Wide Association Studies (GWAS) or Hi-C studies are often faced with the problem of partitioning chromosomes into successive regions based on a similarity matrix of high-resolution,…

Statistics Theory · Mathematics 2019-02-06 Christophe Ambroise , Alia Dehman , Pierre Neuvial , Guillem Rigaill , Nathalie Vialaneix