Related papers: Initialization methods for optimum average silhoue…

Clustering with the Average Silhouette Width

The Average Silhouette Width (ASW; Rousseeuw (1987)) is a popular cluster validation index to estimate the number of clusters. Here we address the question whether it also is suitable as a general objective function to be optimized for…

Machine Learning · Statistics 2020-11-24 Fatima Batool , Christian Hennig

An agglomerative hierarchical clustering method by optimizing the average silhouette width

An agglomerative hierarchical clustering (AHC) framework and algorithm named HOSil based on a new linkage metric optimized by the average silhouette width (ASW) index is proposed. A conscientious investigation of various clustering methods…

Methodology · Statistics 2019-09-30 Fatima Batool

An upper bound on the silhouette evaluation metric for clustering

The silhouette coefficient quantifies, for each observation, the balance between within-cluster cohesion and between-cluster separation, taking values in the range [-1,1]. The average silhouette width (ASW) is a widely used internal measure…

Machine Learning · Computer Science 2026-03-23 Hugo Sträng , Tai Dinh

Scalable Distributed Approximation of Internal Measures for Clustering Evaluation

The most widely used internal measure for clustering evaluation is the silhouette coefficient, whose naive computation requires a quadratic number of distance calculations, which is clearly unfeasible for massive datasets. Surprisingly,…

Data Structures and Algorithms · Computer Science 2021-01-21 Federico Altieri , Andrea Pietracaprina , Geppino Pucci , Fabio Vandin

Medoid Silhouette clustering with automatic cluster number selection

The evaluation of clustering results is difficult, highly dependent on the evaluated data set and the perspective of the beholder. There are many different clustering quality measures, which try to provide a general measure to validate…

Machine Learning · Computer Science 2023-10-17 Lars Lenssen , Erich Schubert

Revisiting Silhouette Aggregation

Silhouette coefficient is an established internal clustering evaluation measure that produces a score per data point, assessing the quality of its clustering assignment. To assess the quality of the clustering of the whole dataset, the…

Machine Learning · Computer Science 2024-06-25 John Pavlopoulos , Georgios Vardakas , Aristidis Likas

Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient

The problem of estimating the number of clusters (say k) is one of the major challenges for the partitional clustering. This paper proposes an algorithm named k-SCC to estimate the optimal k in categorical data clustering. For the…

Machine Learning · Computer Science 2025-01-28 Duy-Tai Dinh , Tsutomu Fujinami , Van-Nam Huynh

Clustering by Direct Optimization of the Medoid Silhouette

The evaluation of clustering results is difficult, highly dependent on the evaluated data set and the perspective of the beholder. There are many different clustering quality measures, which try to provide a general measure to validate…

Machine Learning · Computer Science 2022-09-27 Lars Lenssen , Erich Schubert

A new selection strategy for selective cluster ensemble based on Diversity and Independency

This research introduces a new strategy in cluster ensemble selection by using Independency and Diversity metrics. In recent years, Diversity and Quality, which are two metrics in evaluation procedure, have been used for selecting basic…

Machine Learning · Statistics 2016-10-11 Muhammad Yousefnezhad , Ali Reihanian , Daoqiang Zhang , Behrouz Minaei-Bidgoli

A new measure for assessment of clustering based on kernel density estimation

A new clustering accuracy measure is proposed to determine the unknown number of clusters and to assess the quality of clustering of a data set given in any dimensional space. Our validity index applies the classical nonparametric…

Methodology · Statistics 2022-02-15 Soumita Modak

SACA: Selective Attention-Based Clustering Algorithm

Clustering algorithms are fundamental tools across many fields, with density-based methods offering particular advantages in identifying arbitrarily shaped clusters and handling noise. However, their effectiveness is often limited by the…

Machine Learning · Computer Science 2025-12-01 Meysam Shirdel Bilehsavar , Razieh Ghaedi , Samira Seyed Taheri , Xinqi Fan , Christian O'Reilly

CAS Condensed and Accelerated Silhouette: An Efficient Method for Determining the Optimal K in K-Means Clustering

Clustering is a critical component of decision-making in todays data-driven environments. It has been widely used in a variety of fields such as bioinformatics, social network analysis, and image processing. However, clustering accuracy…

Machine Learning · Computer Science 2025-07-14 Krishnendu Das , Sumit Gupta , Awadhesh Kumar

An adequacy approach for deciding the number of clusters for OTRIMLE robust Gaussian mixture based clustering

We introduce a new approach to deciding the number of clusters. The approach is applied to Optimally Tuned Robust Improper Maximum Likelihood Estimation (OTRIMLE; Coretto and Hennig 2016) of a Gaussian mixture model allowing for…

Methodology · Statistics 2020-12-29 Christian Hennig , Pietro Coretto

Scalable Sparse Subspace Clustering via Ordered Weighted $\ell_1$ Regression

The main contribution of the paper is a new approach to subspace clustering that is significantly more computationally efficient and scalable than existing state-of-the-art methods. The central idea is to modify the regression technique in…

Machine Learning · Statistics 2018-07-11 Urvashi Oswal , Robert Nowak

Composite Silhouette: A Subsampling-based Aggregation Strategy

Determining the number of clusters is a central challenge in unsupervised learning, where ground-truth labels are unavailable. The Silhouette coefficient is a widely used internal validation metric for this task, yet its standard…

Machine Learning · Computer Science 2026-04-16 Aggelos Semoglou , Aristidis Likas , John Pavlopoulos

A provable initialization and robust clustering method for general mixture models

Clustering is a fundamental tool in statistical machine learning in the presence of heterogeneous data. Most recent results focus primarily on optimal mislabeling guarantees when data are distributed around centroids with sub-Gaussian…

Statistics Theory · Mathematics 2024-10-24 Soham Jana , Jianqing Fan , Sanjeev Kulkarni

A new interpoint distance-based clustering algorithm using kernel density estimation

A novel nonparametric clustering algorithm is proposed using the interpoint distances between the members of the data to reveal the inherent clustering structure existing in the given set of data, where we apply the classical nonparametric…

Methodology · Statistics 2024-09-02 Soumita Modak

Clustering Method for Time-Series Images Using Quantum-Inspired Computing Technology

Time-series clustering serves as a powerful data mining technique for time-series data in the absence of prior knowledge about clusters. A large amount of time-series data with large size has been acquired and used in various research…

Signal Processing · Electrical Eng. & Systems 2024-05-21 Tomoki Inoue , Koyo Kubota , Tsubasa Ikami , Yasuhiro Egami , Hiroki Nagai , Takahiro Kashikawa , Koichi Kimura , Yu Matsuda

Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes

A key issue in cluster analysis is the choice of an appropriate clustering method and the determination of the best number of clusters. Different clusterings are optimal on the same data set according to different criteria, and the choice…

Methodology · Statistics 2020-06-24 Serhat Emre Akhanli , Christian Hennig

An Agglomerative Clustering of Simulation Output Distributions Using Regularized Wasserstein Distance

Using statistical learning methods to analyze stochastic simulation outputs can significantly enhance decision-making by uncovering relationships between different simulated systems and between a system's inputs and outputs. We focus on…

Methodology · Statistics 2026-05-28 Mohammadmahdi Ghasemloo , David J. Eckman