English
Related papers

Related papers: Initialization methods for optimum average silhoue…

200 papers

The Average Silhouette Width (ASW; Rousseeuw (1987)) is a popular cluster validation index to estimate the number of clusters. Here we address the question whether it also is suitable as a general objective function to be optimized for…

Machine Learning · Statistics 2020-11-24 Fatima Batool , Christian Hennig

An agglomerative hierarchical clustering (AHC) framework and algorithm named HOSil based on a new linkage metric optimized by the average silhouette width (ASW) index is proposed. A conscientious investigation of various clustering methods…

Methodology · Statistics 2019-09-30 Fatima Batool

The silhouette coefficient quantifies, for each observation, the balance between within-cluster cohesion and between-cluster separation, taking values in the range [-1,1]. The average silhouette width (ASW) is a widely used internal measure…

Machine Learning · Computer Science 2026-03-23 Hugo Sträng , Tai Dinh

The most widely used internal measure for clustering evaluation is the silhouette coefficient, whose naive computation requires a quadratic number of distance calculations, which is clearly unfeasible for massive datasets. Surprisingly,…

Data Structures and Algorithms · Computer Science 2021-01-21 Federico Altieri , Andrea Pietracaprina , Geppino Pucci , Fabio Vandin

The evaluation of clustering results is difficult, highly dependent on the evaluated data set and the perspective of the beholder. There are many different clustering quality measures, which try to provide a general measure to validate…

Machine Learning · Computer Science 2023-10-17 Lars Lenssen , Erich Schubert

Silhouette coefficient is an established internal clustering evaluation measure that produces a score per data point, assessing the quality of its clustering assignment. To assess the quality of the clustering of the whole dataset, the…

Machine Learning · Computer Science 2024-06-25 John Pavlopoulos , Georgios Vardakas , Aristidis Likas

The problem of estimating the number of clusters (say k) is one of the major challenges for the partitional clustering. This paper proposes an algorithm named k-SCC to estimate the optimal k in categorical data clustering. For the…

Machine Learning · Computer Science 2025-01-28 Duy-Tai Dinh , Tsutomu Fujinami , Van-Nam Huynh

The evaluation of clustering results is difficult, highly dependent on the evaluated data set and the perspective of the beholder. There are many different clustering quality measures, which try to provide a general measure to validate…

Machine Learning · Computer Science 2022-09-27 Lars Lenssen , Erich Schubert

This research introduces a new strategy in cluster ensemble selection by using Independency and Diversity metrics. In recent years, Diversity and Quality, which are two metrics in evaluation procedure, have been used for selecting basic…

Machine Learning · Statistics 2016-10-11 Muhammad Yousefnezhad , Ali Reihanian , Daoqiang Zhang , Behrouz Minaei-Bidgoli

A new clustering accuracy measure is proposed to determine the unknown number of clusters and to assess the quality of clustering of a data set given in any dimensional space. Our validity index applies the classical nonparametric…

Methodology · Statistics 2022-02-15 Soumita Modak

Clustering algorithms are fundamental tools across many fields, with density-based methods offering particular advantages in identifying arbitrarily shaped clusters and handling noise. However, their effectiveness is often limited by the…

Machine Learning · Computer Science 2025-12-01 Meysam Shirdel Bilehsavar , Razieh Ghaedi , Samira Seyed Taheri , Xinqi Fan , Christian O'Reilly

Clustering is a critical component of decision-making in todays data-driven environments. It has been widely used in a variety of fields such as bioinformatics, social network analysis, and image processing. However, clustering accuracy…

Machine Learning · Computer Science 2025-07-14 Krishnendu Das , Sumit Gupta , Awadhesh Kumar

We introduce a new approach to deciding the number of clusters. The approach is applied to Optimally Tuned Robust Improper Maximum Likelihood Estimation (OTRIMLE; Coretto and Hennig 2016) of a Gaussian mixture model allowing for…

Methodology · Statistics 2020-12-29 Christian Hennig , Pietro Coretto

The main contribution of the paper is a new approach to subspace clustering that is significantly more computationally efficient and scalable than existing state-of-the-art methods. The central idea is to modify the regression technique in…

Machine Learning · Statistics 2018-07-11 Urvashi Oswal , Robert Nowak

Determining the number of clusters is a central challenge in unsupervised learning, where ground-truth labels are unavailable. The Silhouette coefficient is a widely used internal validation metric for this task, yet its standard…

Machine Learning · Computer Science 2026-04-16 Aggelos Semoglou , Aristidis Likas , John Pavlopoulos

Clustering is a fundamental tool in statistical machine learning in the presence of heterogeneous data. Most recent results focus primarily on optimal mislabeling guarantees when data are distributed around centroids with sub-Gaussian…

Statistics Theory · Mathematics 2024-10-24 Soham Jana , Jianqing Fan , Sanjeev Kulkarni

A novel nonparametric clustering algorithm is proposed using the interpoint distances between the members of the data to reveal the inherent clustering structure existing in the given set of data, where we apply the classical nonparametric…

Methodology · Statistics 2024-09-02 Soumita Modak

Time-series clustering serves as a powerful data mining technique for time-series data in the absence of prior knowledge about clusters. A large amount of time-series data with large size has been acquired and used in various research…

Signal Processing · Electrical Eng. & Systems 2024-05-21 Tomoki Inoue , Koyo Kubota , Tsubasa Ikami , Yasuhiro Egami , Hiroki Nagai , Takahiro Kashikawa , Koichi Kimura , Yu Matsuda

A key issue in cluster analysis is the choice of an appropriate clustering method and the determination of the best number of clusters. Different clusterings are optimal on the same data set according to different criteria, and the choice…

Methodology · Statistics 2020-06-24 Serhat Emre Akhanli , Christian Hennig

Using statistical learning methods to analyze stochastic simulation outputs can significantly enhance decision-making by uncovering relationships between different simulated systems and between a system's inputs and outputs. We focus on…

Methodology · Statistics 2026-05-28 Mohammadmahdi Ghasemloo , David J. Eckman
‹ Prev 1 2 3 10 Next ›