Related papers: The cluster structure function
In 1974 Kolmogorov proposed a non-probabilistic approach to statistics and model selection. Let data be finite binary strings and models be finite sets of binary strings. Consider model classes consisting of models of given maximal…
We introduce a new clustering method for the classification of functional data sets by their probabilistic law, that is, a procedure that aims to assign data sets to the same cluster if and only if the data were generated with the same…
Clustering provides a common means of identifying structure in complex data, and there is renewed interest in clustering as a tool for the analysis of large data sets in many fields. A natural question is how many clusters are appropriate…
Many clustering schemes are defined by optimizing an objective function defined on the partitions of the underlying set of a finite metric space. In this paper, we construct a framework for studying what happens when we instead impose…
We present a novel approach for finding and evaluating structural models of small metallic nanoparticles. Rather than fitting a single model with many degrees of freedom, the approach algorithmically builds libraries of nanoparticle…
The objective of clustering is to discover natural groups in datasets and to identify geometrical structures which might reside there, without assuming any prior knowledge on the characteristics of the data. The problem can be seen as…
Clustering functional data is a challenging task due to intrinsic infinite-dimensionality and the need for stable, data-adaptive partitioning. In this work, we propose a clustering framework based on Random Projections, which simultaneously…
The clustering of a data set is one of the core tasks in data analytics. Many clustering algorithms exhibit a strong contrast between a favorable performance in practice and bad theoretical worst-cases. Prime examples are least-squares…
A main task in data analysis is to organize data points into coherent groups or clusters. The stochastic block model is a probabilistic model for the cluster structure. This model prescribes different probabilities for the presence of edges…
The goal of data clustering is to partition data points into groups to minimize a given objective function. While most existing clustering algorithms treat each data point as vector, in many applications each datum is not a vector but a…
Functional data analysis involves data described by regular functions rather than by a finite number of real valued variables. While some robust data analysis methods can be applied directly to the very high dimensional vectors obtained…
Numerous papers ask how difficult it is to cluster data. We suggest that the more relevant and interesting question is how difficult it is to cluster data sets {\em that can be clustered well}. More generally, despite the ubiquity and the…
Clustering is one of the fundamental tasks in data analytics and machine learning. In many situations, different clusterings of the same data set become relevant. For example, different algorithms for the same clustering task may return…
Identifying the number $K$ of clusters in a dataset is one of the most difficult problems in clustering analysis. A choice of $K$ that correctly characterizes the features of the data is essential for building meaningful clusters. In this…
While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model summarizing…
Functional data clustering is to identify heterogeneous morphological patterns in the continuous functions underlying the discrete measurements/observations. Application of functional data clustering has appeared in many publications across…
Clustering is an unsupervised learning problem that aims to partition unlabelled data points into groups with similar features. Traditional clustering algorithms provide limited insight into the groups they find as their main focus is…
We formulate a novel technique for the detection of functional clusters in discrete event data. The advantage of this algorithm is that no prior knowledge of the number of functional groups is needed, as our procedure progressively combines…
We propose in this paper an exploratory analysis algorithm for functional data. The method partitions a set of functions into $K$ clusters and represents each cluster by a simple prototype (e.g., piecewise constant). The total number of…
The problem of estimating the number of clusters (say k) is one of the major challenges for the partitional clustering. This paper proposes an algorithm named k-SCC to estimate the optimal k in categorical data clustering. For the…