Related papers: Clustering processes
The problem of clustering is considered, for the case when each data point is a sample generated by a stationary ergodic process. We propose a very natural asymptotic notion of consistency, and show that simple consistent algorithms exist,…
The problem of time-series clustering is considered in the case where each data-point is a sample generated by a piecewise stationary ergodic process. Stationary processes are perhaps the most general class of processes considered in…
Stationarity is a very general, qualitative assumption, that can be assessed on the basis of application specifics. It is thus a rather attractive assumption to base statistical analysis on, especially for problems for which less general…
The problem of change-point estimation is considered under a general framework where the data are generated by unknown stationary ergodic process distributions. In this context, the consistent estimation of the number of change-points is…
We review some developments on clustering stochastic processes and come with the conclusion that asymptotically consistent clustering algorithms can be obtained when the processes are ergodic and the dissimilarity measure satisfies the…
In a stationary ergodic process, clustering is defined as the tendency of events to appear in series of increased frequency separated by longer breaks. Such behavior, contradicting the theoretical "unbiased behavior" with exponential…
A computational theory for clustering and a semi-supervised clustering algorithm is presented. Clustering is defined to be the obtainment of groupings of data such that each group contains no anomalies with respect to a chosen grouping…
Clustering is a widely used unsupervised learning method for finding structure in the data. However, the resulting clusters are typically presented without any guarantees on their robustness; slightly changing the used data sample or…
Consistency is a key property of all statistical procedures analyzing randomly sampled data. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of…
Cluster synchronization is a phenomenon in which a network self-organizes into a pattern of synchronized sets. It has been shown that diverse patterns of stable cluster synchronization can be captured by symmetries of the network. Here we…
Model selection is a major challenge in non-parametric clustering. There is no universally admitted way to evaluate clustering results for the obvious reason that no ground truth is available. The difficulty to find a universal evaluation…
The community structure of complex networks reveals both their organization and hidden relationships among their constituents. Most community detection methods currently available are not deterministic, and their results typically depend on…
The Euclidean k-means problem is arguably the most widely-studied clustering problem in machine learning. While the k-means objective is NP-hard in the worst-case, practitioners have enjoyed remarkable success in applying heuristics like…
We review clustering as an analysis tool and the underlying concepts from an introductory perspective. What is clustering and how can clusterings be realised programmatically? How can data be represented and prepared for a clustering task?…
We propose a novel method for clustering data which is grounded in information-theoretic principles and requires no parametric assumptions. Previous attempts to use information theory to define clusters in an assumption-free way are based…
Designing efficient, effective, and consistent metric clustering algorithms is a significant challenge attracting growing attention. Traditional approaches focus on the stability of cluster centers; unfortunately, this neglects the…
Consensus clustering fuses diverse basic partitions (i.e., clustering results obtained from conventional clustering methods) into an integrated one, which has attracted increasing attention in both academic and industrial areas due to its…
A popular method for selecting the number of clusters is based on stability arguments: one chooses the number of clusters such that the corresponding clustering results are "most stable". In recent years, a series of papers has analyzed the…
In an age of increasingly large data sets, investigators in many different disciplines have turned to clustering as a tool for data analysis and exploration. Existing clustering methods, however, typically depend on several nontrivial…
A celebrated result of Pollard proves asymptotic consistency for $k$-means clustering when the population distribution has finite variance. In this work, we point out that the population-level $k$-means clustering problem is, in fact,…