English
Related papers

Related papers: Data ultrametricity and clusterability

200 papers

We begin with pervasive ultrametricity due to high dimensionality and/or spatial sparsity. How extent or degree of ultrametricity can be quantified leads us to the discussion of varied practical cases when ultrametricity can be partially or…

Statistics Theory · Mathematics 2011-01-11 Fionn Murtagh

Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. For most applications, applying clustering is only appropriate when cluster structure is present. As such, the study of clusterability,…

Machine Learning · Statistics 2018-10-30 A. Adolfsson , M. Ackerman , N. C. Brownstein

Numerous papers ask how difficult it is to cluster data. We suggest that the more relevant and interesting question is how difficult it is to cluster data sets {\em that can be clustered well}. More generally, despite the ubiquity and the…

Machine Learning · Computer Science 2012-05-23 Amit Daniely , Nati Linial , Michael Saks

We study the problem of fitting an ultrametric distance to a dissimilarity graph in the context of hierarchical cluster analysis. Standard hierarchical clustering methods are specified procedurally, rather than in terms of the cost function…

Machine Learning · Computer Science 2021-02-03 Giovanni Chierchia , Benjamin Perret

With rapidly increasing data, clustering algorithms are important tools for data analytics in modern research. They have been successfully applied to a wide range of domains; for instance, bioinformatics, speech recognition, and financial…

Data Structures and Algorithms · Computer Science 2015-12-01 Ka-Chun Wong

Clustering is an unsupervised technique of Data Mining. It means grouping similar objects together and separating the dissimilar ones. Each object in the data set is assigned a class label in the clustering process using a distance measure.…

Information Retrieval · Computer Science 2011-10-13 Parul Agarwal , M. Afshar Alam , Ranjit Biswas

Hierarchical clustering is a powerful tool for exploratory data analysis, organizing data into a tree of clusterings from which a partition can be chosen. This paper generalizes these ideas by proving that, for any reasonable hierarchy, one…

Machine Learning · Computer Science 2025-11-13 Andrew Draganov , Pascal Weber , Rasmus Skibdahl Melanchton Jørgensen , Anna Beer , Claudia Plant , Ira Assent

Clustering algorithms are one of the main analytical methods to detect patterns in unlabeled data. Existing clustering methods typically treat samples in a dataset as points in a metric space and compute distances to group together similar…

Machine Learning · Computer Science 2021-10-12 Tarek Naous , Srinjay Sarkar , Abubakar Abid , James Zou

Cluster analysis is one of the essential tasks in data mining and knowledge discovery. Each type of data poses unique challenges in achieving relatively efficient partitioning of the data into homogeneous groups. While the algorithms for…

Machine Learning · Computer Science 2018-12-11 Ruben A. Gevorgyan , Yenok B. Hakobyan

Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. As such, the study of clusterability, which evaluates whether data possesses such structure, is an integral part of cluster analysis. Yet,…

Machine Learning · Computer Science 2016-02-24 Margareta Ackerman , Andreas Adolfsson , Naomi Brownstein

We review the theory and practice of determining what parts of a data set are ultrametric. It is assumed that the data set, to begin with, is endowed with a metric, and we include discussion of how this can be brought about if a…

Artificial Intelligence · Computer Science 2013-09-17 Fionn Murtagh

Clustering is an underspecified task: there are no universal criteria for what makes a good clustering. This is especially true for relational data, where similarity can be based on the features of individuals, the relationships between…

Machine Learning · Statistics 2017-09-29 Sebastijan Dumancic , Hendrik Blockeel

Clustering in high dimension spaces is a difficult task; the usual distance metrics may no longer be appropriate under the curse of dimensionality. Indeed, the choice of the metric is crucial, and it is highly dependent on the dataset…

Machine Learning · Computer Science 2023-02-14 Simo Alami. C , Rim Kaddah , Jesse Read

We present a technique for clustering categorical data by generating many dissimilarity matrices and averaging over them. We begin by demonstrating our technique on low dimensional categorical data and comparing it to several other…

Machine Learning · Statistics 2017-09-20 Saeid Amiri , Bertrand Clarke , Jennifer Clarke

In an age of increasingly large data sets, investigators in many different disciplines have turned to clustering as a tool for data analysis and exploration. Existing clustering methods, however, typically depend on several nontrivial…

Quantitative Methods · Quantitative Biology 2009-11-11 Noam Slonim , Gurinder Singh Atwal , Gasper Tkacik , William Bialek

One basic requirement of many studies is the necessity of classifying data. Clustering is a proposed method for summarizing networks. Clustering methods can be divided into two categories named model-based approaches and algorithmic…

Machine Learning · Computer Science 2013-02-19 Raheleh Namayandeh , Farzad Didehvar , Zahra Shojaei

Clustering is a widely used technique in data mining applications for discovering patterns in underlying data. Most traditional clustering algorithms are limited to handling datasets that contain either numeric or categorical attributes.…

Artificial Intelligence · Computer Science 2007-05-23 Zengyou He , Xiaofei Xu , Shengchun Deng

Unsupervised machine learning, and in particular data clustering, is a powerful approach for the analysis of datasets and identification of characteristic features occurring throughout a dataset. It is gaining popularity across scientific…

Mesoscale and Nanoscale Physics · Physics 2021-03-23 Maria El Abbassi , Jan Overbeck , Oliver Braun , Michel Calame , Herre S. J. van der Zant , Mickael L. Perrin

Data clustering is the process of identifying natural groupings or clusters within multidimensional data based on some similarity measure. Clustering is a fundamental process in many different disciplines. Hence, researchers from different…

Machine Learning · Computer Science 2014-08-26 Sibei Yang , Liangde Tao , Bingchen Gong

The rapid emergence of high-dimensional data in various areas has brought new challenges to current ensemble clustering research. To deal with the curse of dimensionality, recently considerable efforts in ensemble clustering have been made…

Machine Learning · Computer Science 2021-09-07 Dong Huang , Chang-Dong Wang , Jian-Huang Lai , Chee-Keong Kwoh
‹ Prev 1 2 3 10 Next ›