English
Related papers

Related papers: Optimal Parallel Algorithms for Dendrogram Computa…

200 papers

Single-linkage clustering is a popular form of hierarchical agglomerative clustering (HAC) where the distance between two clusters is defined as the minimum distance between any pair of points across the two clusters. In single-linkage HAC,…

Data Structures and Algorithms · Computer Science 2025-06-24 Quinten De Man , Laxman Dhulipala , Kishen N Gowda

This paper presents \pandora, a novel parallel algorithm for efficiently constructing dendrograms for single-linkage hierarchical clustering, including \hdbscan. Traditional dendrogram construction methods from a minimum spanning tree…

Machine Learning · Computer Science 2025-04-29 Piyush Sao , Andrey Prokopenko , Damien Lebrun-Grandié

This paper presents new parallel algorithms for generating Euclidean minimum spanning trees and spatial clustering hierarchies (known as HDBSCAN$^*$). Our approach is based on generating a well-separated pair decomposition followed by using…

Data Structures and Algorithms · Computer Science 2021-04-05 Yiqiu Wang , Shangdi Yu , Yan Gu , Julian Shun

We address the problem of computing a single linkage dendrogram. A possible approach is to: (i) Form an edge weighted graph $G$ over the data, with edge weights reflecting dissimilarities. (ii) Calculate the MST $T$ of $G$. (iii) Break the…

Data Structures and Algorithms · Computer Science 2019-11-04 Huanbiao Zhu , Werner Stuetzle

This paper presents new deterministic and distributed low-diameter decomposition algorithms for weighted graphs. In particular, we show that if one can efficiently compute approximate distances in a parallel or a distributed setting, one…

Data Structures and Algorithms · Computer Science 2022-09-07 Václav Rozhoň , Michael Elkin , Christoph Grunau , Bernhard Haeupler

We present the design and analysis of a near linear-work parallel algorithm for solving symmetric diagonally dominant (SDD) linear systems. On input of a SDD $n$-by-$n$ matrix $A$ with $m$ non-zero entries and a vector $b$, our algorithm…

Data Structures and Algorithms · Computer Science 2011-11-09 Guy E. Blelloch , Anupam Gupta , Ioannis Koutis , Gary L. Miller , Richard Peng , Kanat Tangwongsan

Convex clustering is a modern clustering framework that guarantees globally optimal solutions and performs comparably to other advanced clustering methods. However, obtaining a complete dendrogram (clusterpath) for large-scale datasets…

Machine Learning · Computer Science 2025-04-01 Bingyuan Zhang , Yoshikazu Terada

One of the main challenges for hierarchical clustering is how to appropriately identify the representative points in the lower level of the cluster tree, which are going to be utilized as the roots in the higher level of the cluster tree…

Machine Learning · Statistics 2021-11-16 Wen-Bo Xie , Zhen Liu , Jaideep Srivastava

Hierarchical clustering and community detection are important problems in machine learning and complex network analysis. A common approach to identify clusters is to simply cut dendrograms at some threshold. However, single-level cuts are…

Physics and Society · Physics 2025-12-10 Louis Boucherie , Yong-Yeol Ahn , Sune Lehmann

We derive a statistical model for estimation of a dendrogram from single linkage hierarchical clustering (SLHC) that takes account of uncertainty through noise or corruption in the measurements of separation of data. Our focus is on just…

Machine Learning · Statistics 2015-11-26 Dekang Zhu , Dan P. Guralnik , Xuezhi Wang , Xiang Li , Bill Moran

Clustering is a fundamental tool for analyzing large data sets. A rich body of work has been devoted to designing data-stream algorithms for the relevant optimization problems such as $k$-center, $k$-median, and $k$-means. Such algorithms…

Data Structures and Algorithms · Computer Science 2018-12-06 Kook Jin Ahn , Graham Cormode , Sudipto Guha , Andrew McGregor , Anthony Wirth

Clustering multidimensional points is a fundamental data mining task, with applications in many fields, such as astronomy, neuroscience, bioinformatics, and computer vision. The goal of clustering algorithms is to group similar objects…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-22 Yihao Huang , Shangdi Yu , Julian Shun

Previously, we proposed a physically-inspired method to construct data points into an effective in-tree (IT) structure, in which the underlying cluster structure in the dataset is well revealed. Although there are some edges in the IT…

Machine Learning · Statistics 2015-07-30 Teng Qiu , Yongjie Li

The minimum spanning tree clustering algorithm is capable of detecting clusters with irregular boundaries. In this paper we propose two minimum spanning trees based clustering algorithm. The first algorithm produces k clusters with center…

Other Computer Science · Computer Science 2010-05-26 S. John Peter , S. P. Victor

This paper studies the hierarchical clustering problem, where the goal is to produce a dendrogram that represents clusters at varying scales of a data set. We propose the ParChain framework for designing parallel hierarchical agglomerative…

Data Structures and Algorithms · Computer Science 2022-02-15 Shangdi Yu , Yiqiu Wang , Yan Gu , Laxman Dhulipala , Julian Shun

Modern trends in data collection are bringing current mainstream techniques for database query processing to their limits. Consequently, various novel approaches for efficient query processing are being actively studied. One such approach…

Databases · Computer Science 2022-04-13 Georg Gottlob , Matthias Lanzinger , Cem Okulmus , Reinhard Pichler

Parallelism has become a central concern in modern decoding frameworks aiming to meet stringent throughput and latency requirements. Guessing Random Additive Noise Decoding (GRAND) is a recently proposed decoding paradigm that tests…

Information Theory · Computer Science 2026-05-04 Li Wan , Huarui Yin , Wenyi Zhang

We show fast deterministic algorithms for fundamental problems on forests in the challenging low-space regime of the well-known Massive Parallel Computation (MPC) model. A recent breakthrough result by Coy and Czumaj [STOC'22] shows that,…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-11-08 Alkida Balliu , Rustam Latypov , Yannic Maus , Dennis Olivetti , Jara Uitto

We present a new way to summarize and select mixture models via the hierarchical clustering tree (dendrogram) constructed from an overfitted latent mixing measure. Our proposed method bridges agglomerative hierarchical clustering and…

Methodology · Statistics 2024-03-11 Dat Do , Linh Do , Scott A. McKinley , Jonathan Terhorst , XuanLong Nguyen

Search trees on trees (STTs) generalize the fundamental binary search tree (BST) data structure: in STTs the underlying search space is an arbitrary tree, whereas in BSTs it is a path. An optimal BST of size $n$ can be computed for a given…

Data Structures and Algorithms · Computer Science 2022-09-19 Benjamin Aram Berendsohn , Ishay Golinsky , Haim Kaplan , László Kozma
‹ Prev 1 2 3 10 Next ›