Related papers: Scalable Initialization Methods for Large-Scale Cl…

A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm

K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization…

Machine Learning · Computer Science 2012-09-11 M. Emre Celebi , Hassan A. Kingravi , Patricio A. Vela

k2-means for fast and accurate large scale clustering

We propose k^2-means, a new clustering method which efficiently copes with large numbers of clusters and achieves low energy solutions. k^2-means builds upon the standard k-means (Lloyd's algorithm) and combines a new strategy to accelerate…

Machine Learning · Computer Science 2016-05-31 Eirikur Agustsson , Radu Timofte , Luc Van Gool

Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the…

Machine Learning · Computer Science 2014-09-16 M. Emre Celebi , Hassan A. Kingravi

Deterministic Initialization of the K-Means Algorithm Using Hierarchical Clustering

K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization…

Machine Learning · Computer Science 2013-04-30 M. Emre Celebi , Hassan A. Kingravi

Big-Data Clustering: K-Means or K-Indicators?

The K-means algorithm is arguably the most popular data clustering method, commonly applied to processed datasets in some "feature spaces", as is in spectral clustering. Highly sensitive to initializations, however, K-means encounters a…

Machine Learning · Computer Science 2019-06-04 Feiyu Chen , Yuchen Yang , Liwei Xu , Taiping Zhang , Yin Zhang

An Initial Seed Selection Algorithm for K-means Clustering of Georeferenced Data to Improve Replicability of Cluster Assignments for Mapping Application

K-means is one of the most widely used clustering algorithms in various disciplines, especially for large datasets. However the method is known to be highly sensitive to initial seed selection of cluster centers. K-means++ has been proposed…

Machine Learning · Computer Science 2016-04-19 Fouad Khan

Superior Parallel Big Data Clustering through Competitive Stochastic Sample Size Optimization in Big-means

This paper introduces a novel K-means clustering algorithm, an advancement on the conventional Big-means methodology. The proposed method efficiently integrates parallel processing, stochastic sampling, and competitive optimization to…

Machine Learning · Computer Science 2024-03-28 Rustam Mussabayev , Ravil Mussabayev

Comparative Analysis of Optimization Strategies for K-means Clustering in Big Data Contexts: A Review

This paper presents a comparative analysis of different optimization techniques for the K-means algorithm in the context of big data. K-means is a widely used clustering algorithm, but it can suffer from scalability issues when dealing with…

Machine Learning · Computer Science 2024-05-21 Ravil Mussabayev , Rustam Mussabayev

An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations

K-Means is one of the most used algorithms for data clustering and the usual clustering method for benchmarking. Despite its wide application it is well-known that it suffers from a series of disadvantages; it is only able to find local…

Machine Learning · Computer Science 2021-03-02 Avgoustinos Vouros , Stephen Langdell , Mike Croucher , Eleni Vasilaki

Scalable K-Means++

Over half a century old and showing no signs of aging, k-means remains one of the most popular data processing algorithms. As is well-known, a proper initialization of k-means is crucial for obtaining a good final solution. The recently…

Databases · Computer Science 2012-03-30 Bahman Bahmani , Benjamin Moseley , Andrea Vattani , Ravi Kumar , Sergei Vassilvitskii

Adaptive Initialization Method for K-means Algorithm

The K-means algorithm is a widely used clustering algorithm that offers simplicity and efficiency. However, the traditional K-means algorithm uses the random method to determine the initial cluster centers, which make clustering results…

Machine Learning · Computer Science 2019-11-28 Jie Yang , Yu-Kai Wang , Xin Yao , Chin-Teng Lin

An Experimental Comparison of Several Clustering and Initialization Methods

We examine methods for clustering in high dimensions. In the first part of the paper, we perform an experimental comparison between three batch clustering algorithms: the Expectation-Maximization (EM) algorithm, a winner take all version of…

Machine Learning · Computer Science 2015-05-19 Marina Meila , David Heckerman

CKmeans and FCKmeans : Two deterministic initialization procedures for Kmeans algorithm using a modified crowding distance

This paper presents two novel deterministic initialization procedures for K-means clustering based on a modified crowding distance. The procedures, named CKmeans and FCKmeans, use more crowded points as initial centroids. Experimental…

Machine Learning · Computer Science 2023-05-02 Abdesslem Layeb

Performance Analysis of AIM-K-means & K-means in Quality Cluster Generation

Among all the partition based clustering algorithms K-means is the most popular and well known method. It generally shows impressive results even in considerably large data sets. The computational complexity of K-means does not suffer from…

Machine Learning · Computer Science 2009-12-22 Samarjeet Borah , Mrinal Kanti Ghose

Normalization based K means Clustering Algorithm

K-means is an effective clustering technique used to separate similar data into groups based on initial centroids of clusters. In this paper, Normalization based K-means clustering algorithm(N-K means) is proposed. Proposed N-K means…

Machine Learning · Computer Science 2015-03-04 Deepali Virmani , Shweta Taneja , Geetika Malhotra

How to Use K-means for Big Data Clustering?

K-means plays a vital role in data mining and is the simplest and most widely used algorithm under the Euclidean Minimum Sum-of-Squares Clustering (MSSC) model. However, its performance drastically drops when applied to vast amounts of…

Machine Learning · Computer Science 2023-11-27 Rustam Mussabayev , Nenad Mladenovic , Bassem Jarboui , Ravil Mussabayev

DISCERN: Diversity-based Selection of Centroids for k-Estimation and Rapid Non-stochastic Clustering

One of the applications of center-based clustering algorithms such as K-Means is partitioning data points into K clusters. In some examples, the feature space relates to the underlying problem we are trying to solve, and sometimes we can…

Machine Learning · Computer Science 2020-09-23 Ali Hassani , Amir Iranmanesh , Mahdi Eftekhari , Abbas Salemi

An efficient K -means clustering algorithm for massive data

The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. In this sense, cluster analysis algorithms are a key element of exploratory data analysis, due to their easiness in the…

Machine Learning · Statistics 2018-01-10 Marco Capó , Aritz Pérez , Jose A. Lozano

Log-Time K-Means Clustering for 1D Data: Novel Approaches with Proof and Implementation

Clustering is a key task in machine learning, with $k$-means being widely used for its simplicity and effectiveness. While 1D clustering is common, existing methods often fail to exploit the structure of 1D data, leading to inefficiencies.…

Data Structures and Algorithms · Computer Science 2024-12-25 Jake Hyun

Global $k$-means$++$: an effective relaxation of the global $k$-means clustering algorithm

The $k$-means algorithm is a prevalent clustering method due to its simplicity, effectiveness, and speed. However, its main disadvantage is its high sensitivity to the initial positions of the cluster centers. The global $k$-means is a…

Machine Learning · Computer Science 2023-07-17 Georgios Vardakas , Aristidis Likas