Related papers: Data ultrametricity and clusterability

Ultrametric embedding: application to data fingerprinting and to fast data clustering

We begin with pervasive ultrametricity due to high dimensionality and/or spatial sparsity. How extent or degree of ultrametricity can be quantified leads us to the discussion of varied practical cases when ultrametricity can be partially or…

Statistics Theory · Mathematics 2011-01-11 Fionn Murtagh

To Cluster, or Not to Cluster: An Analysis of Clusterability Methods

Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. For most applications, applying clustering is only appropriate when cluster structure is present. As such, the study of clusterability,…

Machine Learning · Statistics 2018-10-30 A. Adolfsson , M. Ackerman , N. C. Brownstein

Clustering is difficult only when it does not matter

Numerous papers ask how difficult it is to cluster data. We suggest that the more relevant and interesting question is how difficult it is to cluster data sets {\em that can be clustered well}. More generally, despite the ubiquity and the…

Machine Learning · Computer Science 2012-05-23 Amit Daniely , Nati Linial , Michael Saks

Ultrametric Fitting by Gradient Descent

We study the problem of fitting an ultrametric distance to a dissimilarity graph in the context of hierarchical cluster analysis. Standard hierarchical clustering methods are specified procedurally, rather than in terms of the cost function…

Machine Learning · Computer Science 2021-02-03 Giovanni Chierchia , Benjamin Perret

A Short Survey on Data Clustering Algorithms

With rapidly increasing data, clustering algorithms are important tools for data analytics in modern research. They have been successfully applied to a wide range of domains; for instance, bioinformatics, speech recognition, and financial…

Data Structures and Algorithms · Computer Science 2015-12-01 Ka-Chun Wong

Issues,Challenges and Tools of Clustering Algorithms

Clustering is an unsupervised technique of Data Mining. It means grouping similar objects together and separating the dissimilar ones. Each object in the data set is assigned a class label in the clustering process using a distance measure.…

Information Retrieval · Computer Science 2011-10-13 Parul Agarwal , M. Afshar Alam , Ranjit Biswas

Ultrametric Cluster Hierarchies: I Want 'em All!

Hierarchical clustering is a powerful tool for exploratory data analysis, organizing data into a tree of clusterings from which a partition can be chosen. This paper generalizes these ideas by proving that, for any reasonable hierarchy, one…

Machine Learning · Computer Science 2025-11-13 Andrew Draganov , Pascal Weber , Rasmus Skibdahl Melanchton Jørgensen , Anna Beer , Claudia Plant , Ira Assent

Clustering Plotted Data by Image Segmentation

Clustering algorithms are one of the main analytical methods to detect patterns in unlabeled data. Existing clustering methods typically treat samples in a dataset as points in a metric space and compute distances to group together similar…

Machine Learning · Computer Science 2021-10-12 Tarek Naous , Srinjay Sarkar , Abubakar Abid , James Zou

A matching based clustering algorithm for categorical data

Cluster analysis is one of the essential tasks in data mining and knowledge discovery. Each type of data poses unique challenges in achieving relatively efficient partitioning of the data into homogeneous groups. While the algorithms for…

Machine Learning · Computer Science 2018-12-11 Ruben A. Gevorgyan , Yenok B. Hakobyan

An Effective and Efficient Approach for Clusterability Evaluation

Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. As such, the study of clusterability, which evaluates whether data possesses such structure, is an integral part of cluster analysis. Yet,…

Machine Learning · Computer Science 2016-02-24 Margareta Ackerman , Andreas Adolfsson , Naomi Brownstein

Ultrametric Component Analysis with Application to Analysis of Text and of Emotion

We review the theory and practice of determining what parts of a data set are ultrametric. It is assumed that the data set, to begin with, is endowed with a metric, and we include discussion of how this can be brought about if a…

Artificial Intelligence · Computer Science 2013-09-17 Fionn Murtagh

An expressive dissimilarity measure for relational clustering using neighbourhood trees

Clustering is an underspecified task: there are no universal criteria for what makes a good clustering. This is especially true for relational data, where similarity can be based on the features of individuals, the relationships between…

Machine Learning · Statistics 2017-09-29 Sebastijan Dumancic , Hendrik Blockeel

Transferable Deep Metric Learning for Clustering

Clustering in high dimension spaces is a difficult task; the usual distance metrics may no longer be appropriate under the curse of dimensionality. Indeed, the choice of the metric is crucial, and it is highly dependent on the dataset…

Machine Learning · Computer Science 2023-02-14 Simo Alami. C , Rim Kaddah , Jesse Read

Clustering categorical data via ensembling dissimilarity matrices

We present a technique for clustering categorical data by generating many dissimilarity matrices and averaging over them. We begin by demonstrating our technique on low dimensional categorical data and comparing it to several other…

Machine Learning · Statistics 2017-09-20 Saeid Amiri , Bertrand Clarke , Jennifer Clarke

Information based clustering

In an age of increasingly large data sets, investigators in many different disciplines have turned to clustering as a tool for data analysis and exploration. Existing clustering methods, however, typically depend on several nontrivial…

Quantitative Methods · Quantitative Biology 2009-11-11 Noam Slonim , Gurinder Singh Atwal , Gasper Tkacik , William Bialek

Clustering validity based on the most similarity

One basic requirement of many studies is the necessity of classifying data. Clustering is a proposed method for summarizing networks. Clustering methods can be divided into two categories named model-based approaches and algorithmic…

Machine Learning · Computer Science 2013-02-19 Raheleh Namayandeh , Farzad Didehvar , Zahra Shojaei

Clustering Mixed Numeric and Categorical Data: A Cluster Ensemble Approach

Clustering is a widely used technique in data mining applications for discovering patterns in underlying data. Most traditional clustering algorithms are limited to handling datasets that contain either numeric or categorical attributes.…

Artificial Intelligence · Computer Science 2007-05-23 Zengyou He , Xiaofei Xu , Shengchun Deng

Benchmark and application of unsupervised classification approaches for univariate data

Unsupervised machine learning, and in particular data clustering, is a powerful approach for the analysis of datasets and identification of characteristic features occurring throughout a dataset. It is gaining popularity across scientific…

Mesoscale and Nanoscale Physics · Physics 2021-03-23 Maria El Abbassi , Jan Overbeck , Oliver Braun , Michel Calame , Herre S. J. van der Zant , Mickael L. Perrin

Introduction to Clustering Algorithms and Applications

Data clustering is the process of identifying natural groupings or clusters within multidimensional data based on some similarity measure. Clustering is a fundamental process in many different disciplines. Hence, researchers from different…

Machine Learning · Computer Science 2014-08-26 Sibei Yang , Liangde Tao , Bingchen Gong

Toward Multidiversified Ensemble Clustering of High-Dimensional Data: From Subspaces to Metrics and Beyond

The rapid emergence of high-dimensional data in various areas has brought new challenges to current ensemble clustering research. To deal with the curse of dimensionality, recently considerable efforts in ensemble clustering have been made…

Machine Learning · Computer Science 2021-09-07 Dong Huang , Chang-Dong Wang , Jian-Huang Lai , Chee-Keong Kwoh