Related papers: Layers and stability
HDBSCAN is a density-based clustering algorithm that constructs a cluster hierarchy tree and then uses a specific stability measure to extract flat clusters from the tree. We show how the application of an additional threshold value can…
We present an accelerated algorithm for hierarchical density based clustering. Our new algorithm improves upon HDBSCAN*, which itself provided a significant qualitative improvement over the popular DBSCAN algorithm. The accelerated HDBSCAN*…
This paper tries to present a more unified view of clustering, by identifying the relationships between five different clustering algorithms. Some of the results are not new, but they are presented in a cleaner, simpler and more concise…
Most community detection approaches make very strong assumptions about communities in the data, such as every vertex must belong to exactly one community (the communities form a partition). For vector data, Hierarchical Density Based…
Hierarchical clustering is a common algorithm in data analysis. It is unique among many clustering algorithms in that it draws dendrograms based on the distance of data under a certain metric, and group them. It is widely used in all areas…
In the first half this paper, we generalize the theory of layer points for Lesnick- (or degree-Rips-) complexes to the more general context of $\vec{v}$-hierarchical clusterings. Layer points provide a compressed description of a…
Economic policy and research rely on the correct evaluation of the billions of high-frequency data points that we collect every day. Consistent clustering algorithms, like DBSCAN, allow us to make sense of the data in a useful way. However,…
Clustering is a cornerstone of modern data analysis. Detecting clusters in exploratory data analyses (EDA) requires algorithms that make few assumptions about the data. Density-based clustering algorithms are particularly well-suited for…
Clustering algorithms are often used to find subpopulations in exploratory data analysis workflows. Not only the clusters themselves, but also their shape can represent meaningful subpopulations. In this paper, we present FLASC, an…
This work incorporates topological features via persistence diagrams to classify point cloud data arising from materials science. Persistence diagrams are multisets summarizing the connectedness and holes of given data. A new distance on…
By natural way the hierarchy structure is introduced on directed graphs with weighted adjacencies. Embedded system of algebras of subsets of the set of vertices of such digraph and it's consolidations, which vertices are the elementary sets…
A popular method for selecting the number of clusters is based on stability arguments: one chooses the number of clusters such that the corresponding clustering results are "most stable". In recent years, a series of papers has analyzed the…
HDBSCAN*, a state-of-the-art density-based hierarchical clustering method, produces a hierarchical organization of clusters in a dataset w.r.t. a parameter mpts. While the performance of HDBSCAN* is robust w.r.t. mpts in the sense that a…
Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm which has the high-performance rate for dataset where clusters have the constant density of data points. One of the significant attributes…
DBSCAN and OPTICS are powerful algorithms for identifying clusters of points in domains where few assumptions can be made about the structure of the data. In this paper, we leverage these strengths and introduce a new algorithm, LINSCAN,…
Lensing by galaxy clusters is a versatile probe of cosmology and extragalactic astrophysics, but the accuracy of some of its predictions is limited by the simplified models adopted to reduce the (otherwise untractable) number of degrees of…
In this paper we explore the use of spatial clustering algorithms as a new computational approach for modeling the cosmic web. We demonstrate that such algorithms are efficient in terms of computing time needed. We explore three distinct…
Clustering is an unsupervised technique for grouping data points by similarity. While explainability methods exist for supervised machine learning, they are not directly applicable to clustering, making it challenging to understand cluster…
We establish a hierarchy of Euclidean stars according to their degree of complexity, as measured by the complexity factor and the complexity of the pattern of evolution. We consider both, nondissipative and dissipative systems. Solutions…
DBSCAN* and HDBSCAN* are well established density based clustering algorithms. However, obtaining the clusters of very large datasets is infeasible, limiting their use in real world applications. By exploiting the geometry of Euclidean…