Related papers: Robust seed selection algorithm for k-means type a…

An Initial Seed Selection Algorithm for K-means Clustering of Georeferenced Data to Improve Replicability of Cluster Assignments for Mapping Application

K-means is one of the most widely used clustering algorithms in various disciplines, especially for large datasets. However the method is known to be highly sensitive to initial seed selection of cluster centers. K-means++ has been proposed…

Machine Learning · Computer Science 2016-04-19 Fouad Khan

Exact Acceleration of K-Means++ and K-Means$\|$

K-Means++ and its distributed variant K-Means$\|$ have become de facto tools for selecting the initial seeds of K-means. While alternatives have been developed, the effectiveness, ease of implementation, and theoretical grounding of the…

Machine Learning · Computer Science 2021-05-10 Edward Raff

Fast and Accurate $k$-means++ via Rejection Sampling

$k$-means++ \cite{arthur2007k} is a widely used clustering algorithm that is easy to implement, has nice theoretical guarantees and strong empirical performance. Despite its wide adoption, $k$-means++ sometimes suffers from being slow on…

Machine Learning · Computer Science 2020-12-23 Vincent Cohen-Addad , Silvio Lattanzi , Ashkan Norouzi-Fard , Christian Sohler , Ola Svensson

Improved Outlier Robust Seeding for k-means

The $k$-means is a popular clustering objective, although it is inherently non-robust and sensitive to outliers. Its popular seeding or initialization called $k$-means++ uses $D^{2}$ sampling and comes with a provable $O(\log k)$…

Machine Learning · Computer Science 2023-09-07 Amit Deshpande , Rameshwar Pratap

Seeding K-Means using Method of Moments

K-means is one of the most widely used algorithms for clustering in Data Mining applications, which attempts to minimize the sum of the square of the Euclidean distance of the points in the clusters from the respective means of the…

Machine Learning · Computer Science 2016-11-01 Sayantan Dasgupta

A bad 2-dimensional instance for k-means++

The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial $k$ centers when using the k-means heuristic. The algorithm is a simple sampling procedure and can be described as follows: {quote}…

Data Structures and Algorithms · Computer Science 2013-06-19 Ragesh Jaiswal , Prachi Jain , Saumya Yadav

Parallelization of Kmeans++ using CUDA

K-means++ is an algorithm which is invented to improve the process of finding initial seeds in K-means algorithm. In this algorithm, initial seeds are chosen consecutively by a probability which is proportional to the distance to the…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-08-07 Maliheh Heydarpour Shahrezaei , Reza Tavoli

DISCERN: Diversity-based Selection of Centroids for k-Estimation and Rapid Non-stochastic Clustering

One of the applications of center-based clustering algorithms such as K-Means is partitioning data points into K clusters. In some examples, the feature space relates to the underlying problem we are trying to solve, and sometimes we can…

Machine Learning · Computer Science 2020-09-23 Ali Hassani , Amir Iranmanesh , Mahdi Eftekhari , Abbas Salemi

A tight lower bound instance for k-means++ in constant dimension

The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial $k$ centers when using the k-means heuristic. The algorithm is a simple sampling procedure and can be described as follows: Pick the…

Data Structures and Algorithms · Computer Science 2014-01-15 Anup Bhattacharya , Ragesh Jaiswal , Nir Ailon

An enhanced method of initial cluster center selection for K-means algorithm

Clustering is one of the widely used techniques to find out patterns from a dataset that can be applied in different applications or analyses. K-means, the most popular and simple clustering algorithm, might get trapped into local minima if…

Machine Learning · Computer Science 2022-10-19 Zillur Rahman , Md. Sabir Hossain , Mohammad Hasan , Ahmed Imteaj

Cluster-level Group Representativity Fairness in $k$-means Clustering

There has been much interest recently in developing fair clustering algorithms that seek to do justice to the representation of groups defined along sensitive attributes such as race and gender. We observe that clustering algorithms could…

Machine Learning · Computer Science 2023-01-02 Stanley Simoes , Deepak P , Muiris MacCarthaigh

A sampling-based approach for efficient clustering in large datasets

We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…

Machine Learning · Computer Science 2022-03-30 Georgios Exarchakis , Omar Oubari , Gregor Lenz

Multiple Kernel $k$-Means Clustering by Selecting Representative Kernels

To cluster data that are not linearly separable in the original feature space, $k$-means clustering was extended to the kernel version. However, the performance of kernel $k$-means clustering largely depends on the choice of kernel…

Machine Learning · Computer Science 2018-11-02 Yaqiang Yao , Huanhuan Chen

Are Easy Data Easy (for K-Means)

This paper investigates the capability of correctly recovering well-separated clusters by various brands of the $k$-means algorithm. The concept of well-separatedness used here is derived directly from the common definition of clusters,…

Machine Learning · Computer Science 2023-08-07 Mieczysław A. Kłopotek

Deterministic Feature Selection for $k$-means Clustering

We study feature selection for $k$-means clustering. Although the literature contains many methods with good empirical performance, algorithms with provable theoretical behavior have only recently been developed. Unfortunately, these…

Machine Learning · Computer Science 2016-11-17 Christos Boutsidis , Malik Magdon-Ismail

Improvement of K Mean Clustering Algorithm Based on Density

The purpose of this paper is to improve the traditional K-means algorithm. In the traditional K mean clustering algorithm, the initial clustering centers are generated randomly in the data set. It is easy to fall into the local minimum…

Machine Learning · Computer Science 2018-10-11 Su Chang , Xu Zhenzong , Gao Xuan

Improved seeding strategies for k-means and k-GMM

We revisit the randomized seeding techniques for k-means clustering and k-GMM (Gaussian Mixture model fitting with Expectation-Maximization), formalizing their three key ingredients: the metric used for seed sampling, the number of…

Machine Learning · Computer Science 2025-11-04 Guillaume Carrière , Frédéric Cazals

Explainable $k$-Means and $k$-Medians Clustering

Clustering is a popular form of unsupervised learning for geometric data. Unfortunately, many clustering algorithms lead to cluster assignments that are hard to explain, partially because they depend on all the features of the data in a…

Machine Learning · Computer Science 2020-09-23 Sanjoy Dasgupta , Nave Frost , Michal Moshkovitz , Cyrus Rashtchian

Selective inference for k-means clustering

We consider the problem of testing for a difference in means between clusters of observations identified via k-means clustering. In this setting, classical hypothesis tests lead to an inflated Type I error rate. To overcome this problem, we…

Methodology · Statistics 2022-03-30 Yiqun T. Chen , Daniela M. Witten

An efficient K -means clustering algorithm for massive data

The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. In this sense, cluster analysis algorithms are a key element of exploratory data analysis, due to their easiness in the…

Machine Learning · Statistics 2018-01-10 Marco Capó , Aritz Pérez , Jose A. Lozano