Related papers: Linear Time Algorithm for Projective Clustering
Let $P$ be a set of $n$ points in $\mathbb{R}^d$. In the projective clustering problem, given $k, q$ and norm $\rho \in [1,\infty]$, we have to compute a set $\mathcal{F}$ of $k$ $q$-dimensional flats such that $(\sum_{p\in P}d(p,…
Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis. Popular clustering algorithms such as Lloyd's algorithm and $k$-means++ can take $\Omega(ndk)$ time when clustering $n$ points in…
We study the problem of clustering sequences of unlabeled point sets taken from a common metric space. Such scenarios arise naturally in applications where a system or process is observed in distinct time intervals, such as biological…
Given a set of points, clustering consists of finding a partition of a point set into $k$ clusters such that the center to which a point is assigned is as close as possible. Most commonly, centers are points themselves, which leads to the…
Clustering mixtures of Gaussian distributions is a fundamental and challenging problem that is ubiquitous in various high-dimensional data processing tasks. While state-of-the-art work on learning Gaussian mixture models has focused…
In the standard planar $k$-center clustering problem, one is given a set $P$ of $n$ points in the plane, and the goal is to select $k$ center points, so as to minimize the maximum distance over points in $P$ to their nearest center. Here we…
Clustering can be defined as the process of assembling objects into a number of groups whose elements are similar to each other in some manner. As a technique that is used in many domains, such as face clustering, plant categorization,…
In projective clustering we are given a set of n points in $R^d$ and wish to cluster them to a set $S$ of $k$ linear subspaces in $R^d$ according to some given distance function. An $\eps$-coreset for this problem is a weighted (scaled)…
Clustering functional data is a challenging task due to intrinsic infinite-dimensionality and the need for stable, data-adaptive partitioning. In this work, we propose a clustering framework based on Random Projections, which simultaneously…
Consensus clustering (or clustering aggregation) inputs $k$ partitions of a given ground set $V$, and seeks to create a single partition that minimizes disagreement with all input partitions. State-of-the-art algorithms for consensus…
In this paper, a novel method to perform model-based clustering of time series is proposed. The procedure relies on two iterative steps: (i) K global forecasting models are fitted via pooling by considering the series pertaining to each…
Given a set of points $P \subset \mathbb{R}^d$, the $k$-means clustering problem is to find a set of $k$ {\em centers} $C = \{c_1,...,c_k\}, c_i \in \mathbb{R}^d,$ such that the objective function $\sum_{x \in P} d(x,C)^2$, where $d(x,C)$…
\textit{Clustering problems} often arise in the fields like data mining, machine learning etc. to group a collection of objects into similar groups with respect to a similarity (or dissimilarity) measure. Among the clustering problems,…
We study supervised learning problems using clustering constraints to impose structure on either features or samples, seeking to help both prediction and interpretation. The problem of clustering features arises naturally in text…
Clustering is a NP-hard problem. Thus, no optimal algorithm exists, heuristics are applied to cluster the data. Heuristics can be very resource-intensive, if not applied properly. For substantially large data sets computational efficiencies…
$(j,k)$-projective clustering is the natural generalization of the family of $k$-clustering and $j$-subspace clustering problems. Given a set of points $P$ in $\mathbb{R}^d$, the goal is to find $k$ flats of dimension $j$, i.e., affine…
The Consensus Clustering problem has been introduced as an effective way to analyze the results of different microarray experiments. The problem consists of looking for a partition that best summarizes a set of input partitions (each…
Clustering techniques are very attractive for extracting and identifying patterns in datasets. However, their application to very large spatial datasets presents numerous challenges such as high-dimensionality data, heterogeneity, and high…
We present a structural clustering algorithm for large-scale datasets of small labeled graphs, utilizing a frequent subgraph sampling strategy. A set of representatives provides an intuitive description of each cluster, supports the…
We consider the problem of subspace clustering: given points that lie on or near the union of many low-dimensional linear subspaces, recover the subspaces. To this end, one first identifies sets of points close to the same subspace and uses…