Related papers: Robust seed selection algorithm for k-means type a…
K-means is one of the most widely used clustering algorithms in various disciplines, especially for large datasets. However the method is known to be highly sensitive to initial seed selection of cluster centers. K-means++ has been proposed…
K-Means++ and its distributed variant K-Means$\|$ have become de facto tools for selecting the initial seeds of K-means. While alternatives have been developed, the effectiveness, ease of implementation, and theoretical grounding of the…
$k$-means++ \cite{arthur2007k} is a widely used clustering algorithm that is easy to implement, has nice theoretical guarantees and strong empirical performance. Despite its wide adoption, $k$-means++ sometimes suffers from being slow on…
The $k$-means is a popular clustering objective, although it is inherently non-robust and sensitive to outliers. Its popular seeding or initialization called $k$-means++ uses $D^{2}$ sampling and comes with a provable $O(\log k)$…
K-means is one of the most widely used algorithms for clustering in Data Mining applications, which attempts to minimize the sum of the square of the Euclidean distance of the points in the clusters from the respective means of the…
The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial $k$ centers when using the k-means heuristic. The algorithm is a simple sampling procedure and can be described as follows: {quote}…
K-means++ is an algorithm which is invented to improve the process of finding initial seeds in K-means algorithm. In this algorithm, initial seeds are chosen consecutively by a probability which is proportional to the distance to the…
One of the applications of center-based clustering algorithms such as K-Means is partitioning data points into K clusters. In some examples, the feature space relates to the underlying problem we are trying to solve, and sometimes we can…
The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial $k$ centers when using the k-means heuristic. The algorithm is a simple sampling procedure and can be described as follows: Pick the…
Clustering is one of the widely used techniques to find out patterns from a dataset that can be applied in different applications or analyses. K-means, the most popular and simple clustering algorithm, might get trapped into local minima if…
There has been much interest recently in developing fair clustering algorithms that seek to do justice to the representation of groups defined along sensitive attributes such as race and gender. We observe that clustering algorithms could…
We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…
To cluster data that are not linearly separable in the original feature space, $k$-means clustering was extended to the kernel version. However, the performance of kernel $k$-means clustering largely depends on the choice of kernel…
This paper investigates the capability of correctly recovering well-separated clusters by various brands of the $k$-means algorithm. The concept of well-separatedness used here is derived directly from the common definition of clusters,…
We study feature selection for $k$-means clustering. Although the literature contains many methods with good empirical performance, algorithms with provable theoretical behavior have only recently been developed. Unfortunately, these…
The purpose of this paper is to improve the traditional K-means algorithm. In the traditional K mean clustering algorithm, the initial clustering centers are generated randomly in the data set. It is easy to fall into the local minimum…
We revisit the randomized seeding techniques for k-means clustering and k-GMM (Gaussian Mixture model fitting with Expectation-Maximization), formalizing their three key ingredients: the metric used for seed sampling, the number of…
Clustering is a popular form of unsupervised learning for geometric data. Unfortunately, many clustering algorithms lead to cluster assignments that are hard to explain, partially because they depend on all the features of the data in a…
We consider the problem of testing for a difference in means between clusters of observations identified via k-means clustering. In this setting, classical hypothesis tests lead to an inflated Type I error rate. To overcome this problem, we…
The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. In this sense, cluster analysis algorithms are a key element of exploratory data analysis, due to their easiness in the…