English

FLASC: A Flare-Sensitive Clustering Algorithm

Machine Learning 2025-04-23 v2 Databases

Abstract

Clustering algorithms are often used to find subpopulations in exploratory data analysis workflows. Not only the clusters themselves, but also their shape can represent meaningful subpopulations. In this paper, we present FLASC, an algorithm that detects branches within clusters to identify such subpopulations. FLASC builds upon HDBSCAN*, a state-of-the-art density-based clustering algorithm, and detects branches in a post-processing step that describes within-cluster connectivity. Two variants of the algorithm are presented, which trade computational cost for noise robustness. We show that both variants scale similarly to HDBSCAN* in terms of computational cost and provide stable outputs using synthetic data sets, resulting in an efficient flare-sensitive clustering algorithm. In addition, we demonstrate the benefit of branch-detection on two real-world data sets.

Keywords

Cite

@article{arxiv.2311.15887,
  title  = {FLASC: A Flare-Sensitive Clustering Algorithm},
  author = {D. M. Bot and J. Peeters and J. Liesenborgs and J. Aerts},
  journal= {arXiv preprint arXiv:2311.15887},
  year   = {2025}
}

Comments

Previously, 20 pages, 11 figures, submitted to ACM TKDD. Now, 15 pages, 8 figures, submitted to PeerJ Computer Science (simplified method and rewritten for clarity)