Related papers: ABCDE: Application-Based Cluster Diff Evals

More Clustering Quality Metrics for ABCDE

ABCDE is a technique for evaluating clusterings of very large populations of items. Given two clusterings, namely a Baseline clustering and an Experiment clustering, ABCDE can characterize their differences with impact and quality metrics,…

Information Retrieval · Computer Science 2024-09-23 Stephan van Staden

Pointwise Metrics for Clustering Evaluation

This paper defines pointwise clustering metrics, a collection of metrics for characterizing the similarity of two clusterings. These metrics have several interesting properties which make them attractive for practical applications. They can…

Information Retrieval · Computer Science 2024-05-20 Stephan van Staden

Decomposing the Jaccard Distance and the Jaccard Index in ABCDE

ABCDE is a sophisticated technique for evaluating differences between very large clusterings. Its main metric that characterizes the magnitude of the difference between two clusterings is the JaccardDistance, which is a true distance metric…

Information Retrieval · Computer Science 2024-09-30 Stephan van Staden

Evaluation of Cluster Id Assignment Schemes with ABCDE

A cluster id assignment scheme labels each cluster of a clustering with a distinct id. The goal of id assignment is semantic id stability, which means that, whenever possible, a cluster for the same underlying concept as that of a…

Information Retrieval · Computer Science 2024-09-30 Stephan van Staden

Cluster validation by measurement of clustering characteristics relevant to the user

There are many cluster analysis methods that can produce quite different clusterings on the same dataset. Cluster validation is about the evaluation of the quality of a clustering; "relative cluster validation" is about using such criteria…

Methodology · Statistics 2020-09-10 Christian Hennig

Clustering Mixed Numeric and Categorical Data: A Cluster Ensemble Approach

Clustering is a widely used technique in data mining applications for discovering patterns in underlying data. Most traditional clustering algorithms are limited to handling datasets that contain either numeric or categorical attributes.…

Artificial Intelligence · Computer Science 2007-05-23 Zengyou He , Xiaofei Xu , Shengchun Deng

Reclustering: A New Method to Test the Appropriate Level of Clustering

When scholars suspect units are dependent on each other within clusters but independent of each other across clusters, they employ cluster-robust standard errors (CRSEs). Nevertheless, what to cluster over is sometimes unknown. For…

Methodology · Statistics 2025-11-12 Kentaro Fukumoto

Document Clustering Evaluation: Divergence from a Random Baseline

Divergence from a random baseline is a technique for the evaluation of document clustering. It ensures cluster quality measures are performing work that prevents ineffective clusterings from giving high scores to clusterings that provide no…

Information Retrieval · Computer Science 2012-08-30 Christopher M. De Vries , Shlomo Geva , Andrew Trotman

Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes

A key issue in cluster analysis is the choice of an appropriate clustering method and the determination of the best number of clusters. Different clusterings are optimal on the same data set according to different criteria, and the choice…

Methodology · Statistics 2020-06-24 Serhat Emre Akhanli , Christian Hennig

On Clustering on Graphs with Multiple Edge Types

We study clustering on graphs with multiple edge types. Our main motivation is that similarities between objects can be measured in many different metrics. For instance similarity between two papers can be based on common authors, where…

Social and Information Networks · Computer Science 2011-09-09 Matthew Rocklin , Ali Pinar

Clustering validity based on the most similarity

One basic requirement of many studies is the necessity of classifying data. Clustering is a proposed method for summarizing networks. Clustering methods can be divided into two categories named model-based approaches and algorithmic…

Machine Learning · Computer Science 2013-02-19 Raheleh Namayandeh , Farzad Didehvar , Zahra Shojaei

To Cluster, or Not to Cluster: An Analysis of Clusterability Methods

Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. For most applications, applying clustering is only appropriate when cluster structure is present. As such, the study of clusterability,…

Machine Learning · Statistics 2018-10-30 A. Adolfsson , M. Ackerman , N. C. Brownstein

A matching based clustering algorithm for categorical data

Cluster analysis is one of the essential tasks in data mining and knowledge discovery. Each type of data poses unique challenges in achieving relatively efficient partitioning of the data into homogeneous groups. While the algorithms for…

Machine Learning · Computer Science 2018-12-11 Ruben A. Gevorgyan , Yenok B. Hakobyan

Active clustering for labeling training data

Gathering training data is a key step of any supervised learning task, and it is both critical and expensive. Critical, because the quantity and quality of the training data has a high impact on the performance of the learned function.…

Data Structures and Algorithms · Computer Science 2021-10-28 Quentin Lutz , Élie de Panafieu , Alex Scott , Maya Stein

Clustering is difficult only when it does not matter

Numerous papers ask how difficult it is to cluster data. We suggest that the more relevant and interesting question is how difficult it is to cluster data sets {\em that can be clustered well}. More generally, despite the ubiquity and the…

Machine Learning · Computer Science 2012-05-23 Amit Daniely , Nati Linial , Michael Saks

Experimental Estimation of Number of Clusters Based on Cluster Quality

Text Clustering is a text mining technique which divides the given set of text documents into significant clusters. It is used for organizing a huge number of text documents into a well-organized form. In the majority of the clustering…

Information Retrieval · Computer Science 2015-03-12 G. Hannah Grace , Kalyani Desikan

A Critical Note on the Evaluation of Clustering Algorithms

Experimental evaluation is a major research methodology for investigating clustering algorithms and many other machine learning algorithms. For this purpose, a number of benchmark datasets have been widely used in the literature and their…

Machine Learning · Computer Science 2019-10-21 Tiantian Zhang , Li Zhong , Bo Yuan

Seeking the Truth Beyond the Data. An Unsupervised Machine Learning Approach

Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grouped together aiming to the construction of well-established clusters that their elements are classified according to their similarity. The…

Machine Learning · Statistics 2023-10-20 Dimitrios Saligkaras , Vasileios E. Papageorgiou

Evaluating and Validating Cluster Results

Clustering is the technique to partition data according to their characteristics. Data that are similar in nature belong to the same cluster [1]. There are two types of evaluation methods to evaluate clustering quality. One is an external…

Machine Learning · Computer Science 2024-09-05 Anupriya Vysala , Joseph Gomes

A clustering approach for pairwise comparison matrices

We consider clustering in group decision making where the opinions are given by pairwise comparison matrices. In particular, the k-medoids model is suggested to classify the matrices since it has a linear programming problem formulation…

Optimization and Control · Mathematics 2025-04-17 Kolos Csaba Ágoston , Sándor Bozóki , László Csató