Related papers: The cluster structure function

Kolmogorov's Structure Functions and Model Selection

In 1974 Kolmogorov proposed a non-probabilistic approach to statistics and model selection. Let data be finite binary strings and models be finite sets of binary strings. Consider model classes consisting of models of given maximal…

Computational Complexity · Computer Science 2007-05-23 Nikolai Vereshchagin , Paul Vitanyi

Clustering Sets of Functional Data by Similarity in Law

We introduce a new clustering method for the classification of functional data sets by their probabilistic law, that is, a procedure that aims to assign data sets to the same cluster if and only if the data were generated with the same…

Methodology · Statistics 2023-12-29 Antonio Galves , Fernando Najman , Marcela Svarc , Claudia D. Vargas

How many clusters? An information theoretic perspective

Clustering provides a common means of identifying structure in complex data, and there is renewed interest in clustering as a tool for the analysis of large data sets in many fields. A natural question is how many clusters are appropriate…

Data Analysis, Statistics and Probability · Physics 2007-05-23 Susanne Still , William Bialek

Classifying Clustering Schemes

Many clustering schemes are defined by optimizing an objective function defined on the partitions of the underlying set of a finite metric space. In this paper, we construct a framework for studying what happens when we instead impose…

Machine Learning · Statistics 2010-12-01 Gunnar Carlsson , Facundo Memoli

Cluster-mining: An approach for determining core structures of metallic nanoparticles from atomic pair distribution function data

We present a novel approach for finding and evaluating structural models of small metallic nanoparticles. Rather than fitting a single model with many degrees of freedom, the approach algorithmically builds libraries of nanoparticle…

Materials Science · Physics 2019-01-28 Soham Banerjee , Chia-Hao Liu , Kirsten M. O. Jensen , Pavol Juhas , Jennifer D. Lee , Marcus Tofanelli , Christopher J. Ackerson , Christopher B. Murray , Simon J. L. Billinge

A Polynomial Algorithm for Balanced Clustering via Graph Partitioning

The objective of clustering is to discover natural groups in datasets and to identify geometrical structures which might reside there, without assuming any prior knowledge on the characteristics of the data. The problem can be seen as…

Computational Geometry · Computer Science 2018-01-26 Luis-Evaristo Caraballo , José-Miguel Díaz-Báñez , Nadine Kroher

Model-Based Clustering of Functional Data Via Random Projection Ensembles

Clustering functional data is a challenging task due to intrinsic infinite-dimensionality and the need for stable, data-adaptive partitioning. In this work, we propose a clustering framework based on Random Projections, which simultaneously…

Methodology · Statistics 2025-12-18 Matteo Mori , Laura Anderlucci

Good Clusterings Have Large Volume

The clustering of a data set is one of the core tasks in data analytics. Many clustering algorithms exhibit a strong contrast between a favorable performance in practice and bad theoretical worst-cases. Prime examples are least-squares…

Optimization and Control · Mathematics 2018-09-05 S. Borgwardt , F. Happach

Clustering in Partially Labeled Stochastic Block Models via Total Variation Minimization

A main task in data analysis is to organize data points into coherent groups or clusters. The stochastic block model is a probabilistic model for the cluster structure. This model prescribes different probabilities for the presence of edges…

Machine Learning · Computer Science 2020-09-24 Alexander Jung

A Random Finite Set Model for Data Clustering

The goal of data clustering is to partition data points into groups to minimize a given objective function. While most existing clustering algorithms treat each data point as vector, in many applications each datum is not a vector but a…

Machine Learning · Statistics 2017-03-16 Dinh Phung , Ba-Ngu Bo

Constrained variable clustering and the best basis problem in functional data analysis

Functional data analysis involves data described by regular functions rather than by a finite number of real valued variables. While some robust data analysis methods can be applied directly to the very high dimensional vectors obtained…

Machine Learning · Statistics 2012-01-06 Fabrice Rossi , Yves Lechevallier

Clustering is difficult only when it does not matter

Numerous papers ask how difficult it is to cluster data. We suggest that the more relevant and interesting question is how difficult it is to cluster data sets {\em that can be clustered well}. More generally, despite the ubiquity and the…

Machine Learning · Computer Science 2012-05-23 Amit Daniely , Nati Linial , Michael Saks

Constructing Clustering Transformations

Clustering is one of the fundamental tasks in data analytics and machine learning. In many situations, different clusterings of the same data set become relevant. For example, different algorithms for the same clustering task may return…

Optimization and Control · Mathematics 2020-04-06 Steffen Borgwardt , Charles Viss

Selection of the Number of Clusters in Functional Data Analysis

Identifying the number $K$ of clusters in a dataset is one of the most difficult problems in clustering analysis. A choice of $K$ that correctly characterizes the features of the data is essential for building meaningful clusters. In this…

Methodology · Statistics 2019-05-06 Adriano Zanin Zambom , Julian A. Collazos , Ronaldo Dias

Algorithmic Statistics

While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model summarizing…

Statistics Theory · Mathematics 2007-07-16 Peter Gacs , John Tromp , Paul Vitanyi

Review of Clustering Methods for Functional Data

Functional data clustering is to identify heterogeneous morphological patterns in the continuous functions underlying the discrete measurements/observations. Application of functional data clustering has appeared in many publications across…

Methodology · Statistics 2022-10-04 Mimi Zhang , Andrew Parnell

Cluster Explanation via Polyhedral Descriptions

Clustering is an unsupervised learning problem that aims to partition unlabelled data points into groups with similar features. Traditional clustering algorithms provide limited insight into the groups they find as their main focus is…

Machine Learning · Computer Science 2022-10-18 Connor Lawless , Oktay Gunluk

A functional clustering algorithm for the analysis of dynamic network data

We formulate a novel technique for the detection of functional clusters in discrete event data. The advantage of this algorithm is that no prior knowledge of the number of functional groups is needed, as our procedure progressively combines…

Neurons and Cognition · Quantitative Biology 2015-05-13 S. Feldt , J. Waddell , V. L. Hetrick , J. D. Berke , M. Zochowski

Exploratory Analysis of Functional Data via Clustering and Optimal Segmentation

We propose in this paper an exploratory analysis algorithm for functional data. The method partitions a set of functions into $K$ clusters and represents each cluster by a simple prototype (e.g., piecewise constant). The total number of…

Machine Learning · Statistics 2010-04-06 Georges Hébrail , Bernard Hugueney , Yves Lechevallier , Fabrice Rossi

Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient

The problem of estimating the number of clusters (say k) is one of the major challenges for the partitional clustering. This paper proposes an algorithm named k-SCC to estimate the optimal k in categorical data clustering. For the…

Machine Learning · Computer Science 2025-01-28 Duy-Tai Dinh , Tsutomu Fujinami , Van-Nam Huynh