Related papers: Linear Time Algorithm for Projective Clustering

Approximation and Streaming Algorithms for Projective Clustering via Random Projections

Let $P$ be a set of $n$ points in $\mathbb{R}^d$. In the projective clustering problem, given $k, q$ and norm $\rho \in [1,\infty]$, we have to compute a set $\mathcal{F}$ of $k$ $q$-dimensional flats such that $(\sum_{p\in P}d(p,…

Computational Geometry · Computer Science 2015-06-03 Michael Kerber , Sharath Raghvendra

Simple, Scalable and Effective Clustering via One-Dimensional Projections

Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis. Popular clustering algorithms such as Lloyd's algorithm and $k$-means++ can take $\Omega(ndk)$ time when clustering $n$ points in…

Machine Learning · Computer Science 2023-10-26 Moses Charikar , Monika Henzinger , Lunjia Hu , Maxmilian Vötsch , Erik Waingarten

Temporal Clustering

We study the problem of clustering sequences of unlabeled point sets taken from a common metric space. Such scenarios arise naturally in applications where a system or process is observed in distinct time intervals, such as biological…

Data Structures and Algorithms · Computer Science 2017-10-17 Tamal K. Dey , Alfred Rossi , Anastasios Sidiropoulos

On Generalization Bounds for Projective Clustering

Given a set of points, clustering consists of finding a partition of a point set into $k$ clusters such that the center to which a point is assigned is as close as possible. Most commonly, centers are points themselves, which leads to the…

Machine Learning · Computer Science 2023-10-16 Maria Sofia Bucarelli , Matilde Fjeldsø Larsen , Chris Schwiegelshohn , Mads Bech Toftrup

Linear Time Clustering for High Dimensional Mixtures of Gaussian Clouds

Clustering mixtures of Gaussian distributions is a fundamental and challenging problem that is ubiquitous in various high-dimensional data processing tasks. While state-of-the-art work on learning Gaussian mixture models has focused…

Machine Learning · Computer Science 2018-03-05 Dan Kushnir , Shirin Jalali , Iraj Saniee

Clustering with Neighborhoods

In the standard planar $k$-center clustering problem, one is given a set $P$ of $n$ points in the plane, and the goal is to select $k$ center points, so as to minimize the maximum distance over points in $P$ to their nearest center. Here we…

Computational Geometry · Computer Science 2021-09-29 Hongyao Huang , Georgiy Klimenko , Benjamin Raichel

Analysis of Sparse Subspace Clustering: Experiments and Random Projection

Clustering can be defined as the process of assembling objects into a number of groups whose elements are similar to each other in some manner. As a technique that is used in many domains, such as face clustering, plant categorization,…

Machine Learning · Computer Science 2022-04-05 Mehmet F. Demirel , Enrico Au-Yeung

Faster Projective Clustering Approximation of Big Data

In projective clustering we are given a set of n points in $R^d$ and wish to cluster them to a set $S$ of $k$ linear subspaces in $R^d$ according to some given distance function. An $\eps$-coreset for this problem is a weighted (scaled)…

Data Structures and Algorithms · Computer Science 2020-11-30 Adiel Statman , Liat Rozenberg , Dan Feldman

Model-Based Clustering of Functional Data Via Random Projection Ensembles

Clustering functional data is a challenging task due to intrinsic infinite-dimensionality and the need for stable, data-adaptive partitioning. In this work, we propose a clustering framework based on Random Projections, which simultaneously…

Methodology · Statistics 2025-12-18 Matteo Mori , Laura Anderlucci

Efficient Correlation Clustering Methods for Large Consensus Clustering Instances

Consensus clustering (or clustering aggregation) inputs $k$ partitions of a given ground set $V$, and seeks to create a single partition that minimizes disagreement with all input partitions. State-of-the-art algorithms for consensus…

Data Structures and Algorithms · Computer Science 2023-07-11 Nathan Cordner , George Kollios

Time series clustering based on prediction accuracy of global forecasting models

In this paper, a novel method to perform model-based clustering of time series is proposed. The procedure relies on two iterative steps: (i) K global forecasting models are fitted via pooling by considering the series pertaining to each…

Machine Learning · Statistics 2023-05-02 Ángel López Oriona , Pablo Montero Manso , José Antonio Vilar Fernández

A simple D^2-sampling based PTAS for k-means and other Clustering Problems

Given a set of points $P \subset \mathbb{R}^d$, the $k$-means clustering problem is to find a set of $k$ {\em centers} $C = \{c_1,...,c_k\}, c_i \in \mathbb{R}^d,$ such that the objective function $\sum_{x \in P} d(x,C)^2$, where $d(x,C)$…

Data Structures and Algorithms · Computer Science 2012-01-23 Ragesh Jaiswal , Amit Kumar , Sandeep Sen

On Variants of k-means Clustering

\textit{Clustering problems} often arise in the fields like data mining, machine learning etc. to group a collection of objects into similar groups with respect to a similarity (or dissimilarity) measure. Among the clustering problems,…

Computational Geometry · Computer Science 2015-12-10 Sayan Bandyapadhyay , Kasturi Varadarajan

Learning with Clustering Structure

We study supervised learning problems using clustering constraints to impose structure on either features or samples, seeking to help both prediction and interpretation. The problem of clustering features arises naturally in text…

Machine Learning · Computer Science 2016-09-20 Vincent Roulet , Fajwel Fogel , Alexandre d'Aspremont , Francis Bach

Probabilistic Partitive Partitioning (PPP)

Clustering is a NP-hard problem. Thus, no optimal algorithm exists, heuristics are applied to cluster the data. Heuristics can be very resource-intensive, if not applied properly. For substantially large data sets computational efficiencies…

Databases · Computer Science 2020-03-11 Mujahid Sultan

New Coresets for Projective Clustering and Applications

$(j,k)$-projective clustering is the natural generalization of the family of $k$-clustering and $j$-subspace clustering problems. Given a set of points $P$ in $\mathbb{R}^d$, the goal is to find $k$ flats of dimension $j$, i.e., affine…

Machine Learning · Computer Science 2022-03-10 Murad Tukan , Xuan Wu , Samson Zhou , Vladimir Braverman , Dan Feldman

A PTAS for the Minimum Consensus Clustering Problem with a Fixed Number of Clusters

The Consensus Clustering problem has been introduced as an effective way to analyze the results of different microarray experiments. The problem consists of looking for a partition that best summarizes a set of input partitions (each…

Data Structures and Algorithms · Computer Science 2009-07-13 Paola Bonizzoni , Gianluca Della Vedova , Riccardo Dondi

Efficient Large Scale Clustering based on Data Partitioning

Clustering techniques are very attractive for extracting and identifying patterns in datasets. However, their application to very large spatial datasets presents numerous challenges such as high-dimensionality data, heterogeneity, and high…

Databases · Computer Science 2018-02-27 Malika Bendechache , Nhien-An Le-Khac , M-Tahar Kechadi

StruClus: Structural Clustering of Large-Scale Graph Databases

We present a structural clustering algorithm for large-scale datasets of small labeled graphs, utilizing a frequent subgraph sampling strategy. A set of representatives provides an intuitive description of each cluster, supports the…

Databases · Computer Science 2016-10-03 Till Schäfer , Petra Mutzel

Greedy Subspace Clustering

We consider the problem of subspace clustering: given points that lie on or near the union of many low-dimensional linear subspaces, recover the subspaces. To this end, one first identifies sets of points close to the same subspace and uses…

Machine Learning · Statistics 2014-11-03 Dohyung Park , Constantine Caramanis , Sujay Sanghavi