Related papers: Quantized Compressive K-Means
The Lloyd-Max algorithm is a classical approach to perform K-means clustering. Unfortunately, its cost becomes prohibitive as the training dataset grows large. We propose a compressive version of K-means (CKM), that estimates cluster…
We describe a general framework -- compressive statistical learning -- for resource-efficient large-scale learning: the training collection is compressed in one pass into a low-dimensional sketch (a vector of random empirical generalized…
Compressive learning is an approach to efficient large scale learning based on sketching an entire dataset to a single mean embedding (the sketch), i.e. a vector of generalized moments. The learning task is then approximately solved as an…
Clustering data is a popular feature in the field of unsupervised machine learning. Most algorithms aim to find the best method to extract consistent clusters of data, but very few of them intend to cluster data that share the same…
Though mostly used as a clustering algorithm, k-means are originally designed as a quantization algorithm. Namely, it aims at providing a compression of a probability distribution with k points. Building upon [21, 33], we try to investigate…
Kernel $k$-means clustering is a powerful tool for unsupervised learning of non-linearly separable data. Since the earliest attempts, researchers have noted that such algorithms often become trapped by local minima arising from…
We provide statistical learning guarantees for two unsupervised learning tasks in the context of compressive statistical learning, a general framework for resource-efficient large-scale learning that we introduced in a companion paper.The…
This article considers "compressive learning," an approach to large-scale machine learning where datasets are massively compressed before learning (e.g., clustering, classification, or regression) is performed. In particular, a "sketch" is…
The compressive learning framework reduces the computational cost of training on large-scale datasets. In a sketching phase, the data is first compressed to a lightweight sketch vector, obtained by mapping the data samples through a…
K-Means algorithm is a popular clustering method. However, it has two limitations: 1) it gets stuck easily in spurious local minima, and 2) the number of clusters k has to be given a priori. To solve these two issues, a multi-prototypes…
The $k$-means algorithm (Lloyd's algorithm) is a widely used method for clustering unlabeled data. A key bottleneck of the $k$-means algorithm is that each iteration requires time linear in the number of data points, which can be expensive…
Kernel-based clustering algorithms have the ability to capture the non-linear structure in real world data. Among various kernel-based clustering algorithms, kernel k-means has gained popularity due to its simple iterative nature and ease…
We consider a network of binary-valued sensors with a fusion center. The fusion center has to perform K-means clustering on the binary data transmitted by the sensors. In order to reduce the amount of data transmitted within the network,…
The learning of mixture models can be viewed as a clustering problem. Indeed, given data samples independently generated from a mixture of distributions, we often would like to find the {\it correct target clustering} of the samples…
K-means defines one of the most employed centroid-based clustering algorithms with performances tied to the data's embedding. Intricate data embeddings have been designed to push $K$-means performances at the cost of reduced theoretical…
K-means is a classical clustering algorithm with wide applications. However, soft K-means, or fuzzy c-means at m=1, remains unsolved since 1981. To address this challenging open problem, we propose a novel clustering model, i.e.…
The $k$-means algorithm is arguably the most popular nonparametric clustering method but cannot generally be applied to datasets with incomplete records. The usual practice then is to either impute missing values under an assumed…
Cognitive diagnosis models (CDMs) are a popular tool for assessing students' mastery of sets of skills. Given a set of $K$ skills tested on an assessment, students are classified into one of $2^K$ latent skill set profiles that represent…
Machine learning algorithms perform well on identifying patterns in many different datasets due to their versatility. However, as one increases the size of the dataset, the computation time for training and using these statistical models…
Kernel-based K-means clustering has gained popularity due to its simplicity and the power of its implicit non-linear representation of the data. A dominant concern is the memory requirement since memory scales as the square of the number of…