Related papers: A New Parallelization Method for K-means

Embed and Conquer: Scalable Embeddings for Kernel k-Means on MapReduce

The kernel $k$-means is an effective method for data clustering which extends the commonly-used $k$-means algorithm to work on a similarity matrix over complex data structures. The kernel $k$-means algorithm is however computationally very…

Machine Learning · Computer Science 2014-01-30 Ahmed Elgohary , Ahmed K. Farahat , Mohamed S. Kamel , Fakhri Karray

Parallelization of the K-Means Algorithm with Applications to Big Data Clustering

The K-Means clustering using LLoyd's algorithm is an iterative approach to partition the given dataset into K different clusters. The algorithm assigns each point to the cluster based on the following objective function \[\ \min…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-21 Ashish Srivastava , Mohammed Nawfal

High-performance K-means Implementation based on a Simplified Map-Reduce Architecture

The k-means algorithm is one of the most common clustering algorithms and widely used in data mining and pattern recognition. The increasing computational requirement of big data applications makes hardware acceleration for the k-means…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-23 Zhehao Li , Jifang Jin , Lingli Wang

Parallel K-Medoids++ Spatial Clustering Algorithm Based on MapReduce

Clustering analysis has received considerable attention in spatial data mining for several years. With the rapid development of the geospatial information technologies, the size of spatial information data is growing exponentially which…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-08-25 Xia Yue , Wang Man , Jun Yue , Guangcao Liu

Multiple K Means++ Clustering of Satellite Image Using Hadoop MapReduce and Spark

Clustering of image is one of the important steps of mining satellite images. In our experiment we have simultaneously run multiple K-means algorithms with different initial centroids and values of k in the same iteration of MapReduce jobs.…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-05-09 Tapan Sharma , Dr. Vinod Shokeen , Dr. Sunil Mathur

An Initial Seed Selection Algorithm for K-means Clustering of Georeferenced Data to Improve Replicability of Cluster Assignments for Mapping Application

K-means is one of the most widely used clustering algorithms in various disciplines, especially for large datasets. However the method is known to be highly sensitive to initial seed selection of cluster centers. K-means++ has been proposed…

Machine Learning · Computer Science 2016-04-19 Fouad Khan

Improving The Performance Of The K-means Algorithm

The Incremental K-means (IKM), an improved version of K-means (KM), was introduced to improve the clustering quality of KM significantly. However, the speed of IKM is slower than KM. My thesis proposes two algorithms to speed up IKM while…

Machine Learning · Computer Science 2020-05-12 Tien-Dung Nguyen

Accelerating k-Means Clustering with Cover Trees

The k-means clustering algorithm is a popular algorithm that partitions data into k clusters. There are many improvements to accelerate the standard algorithm. Most current research employs upper and lower bounds on point-to-cluster…

Machine Learning · Computer Science 2024-10-22 Andreas Lang , Erich Schubert

Fast k-means based on KNN Graph

In the era of big data, k-means clustering has been widely adopted as a basic processing tool in various contexts. However, its computational cost could be prohibitively high as the data size and the cluster number are large. It is well…

Machine Learning · Computer Science 2017-05-05 Cheng-Hao Deng , Wan-Lei Zhao

Using Multi-Core HW/SW Co-design Architecture for Accelerating K-means Clustering Algorithm

The capability of classifying and clustering a desired set of data is an essential part of building knowledge from data. However, as the size and dimensionality of input data increases, the run-time for such clustering algorithms is…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-25 Hadi Mardani Kamali

Superior Parallel Big Data Clustering through Competitive Stochastic Sample Size Optimization in Big-means

This paper introduces a novel K-means clustering algorithm, an advancement on the conventional Big-means methodology. The proposed method efficiently integrates parallel processing, stochastic sampling, and competitive optimization to…

Machine Learning · Computer Science 2024-03-28 Rustam Mussabayev , Ravil Mussabayev

k2-means for fast and accurate large scale clustering

We propose k^2-means, a new clustering method which efficiently copes with large numbers of clusters and achieves low energy solutions. k^2-means builds upon the standard k-means (Lloyd's algorithm) and combines a new strategy to accelerate…

Machine Learning · Computer Science 2016-05-31 Eirikur Agustsson , Radu Timofte , Luc Van Gool

Distributed Kernel K-Means for Large Scale Clustering

Clustering samples according to an effective metric and/or vector space representation is a challenging unsupervised learning task with a wide spectrum of applications. Among several clustering algorithms, k-means and its kernelized version…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-10 Marco Jacopo Ferrarotti , Sergio Decherchi , Walter Rocchia

Comparative Analysis of Optimization Strategies for K-means Clustering in Big Data Contexts: A Review

This paper presents a comparative analysis of different optimization techniques for the K-means algorithm in the context of big data. K-means is a widely used clustering algorithm, but it can suffer from scalability issues when dealing with…

Machine Learning · Computer Science 2024-05-21 Ravil Mussabayev , Rustam Mussabayev

An efficient K -means clustering algorithm for massive data

The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. In this sense, cluster analysis algorithms are a key element of exploratory data analysis, due to their easiness in the…

Machine Learning · Statistics 2018-01-10 Marco Capó , Aritz Pérez , Jose A. Lozano

How to Use K-means for Big Data Clustering?

K-means plays a vital role in data mining and is the simplest and most widely used algorithm under the Euclidean Minimum Sum-of-Squares Clustering (MSSC) model. However, its performance drastically drops when applied to vast amounts of…

Machine Learning · Computer Science 2023-11-27 Rustam Mussabayev , Nenad Mladenovic , Bassem Jarboui , Ravil Mussabayev

Parallelization of Kmeans++ using CUDA

K-means++ is an algorithm which is invented to improve the process of finding initial seeds in K-means algorithm. In this algorithm, initial seeds are chosen consecutively by a probability which is proportional to the distance to the…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-08-07 Maliheh Heydarpour Shahrezaei , Reza Tavoli

Communication-Avoiding Linear Algebraic Kernel K-Means on GPUs

Clustering is an important tool in data analysis, with K-means being popular for its simplicity and versatility. However, it cannot handle non-linearly separable clusters. Kernel K-means addresses this limitation but requires a large kernel…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-29 Julian Bellavita , Matthew Rubino , Nakul Iyer , Andrew Chang , Aditya Devarakonda , Flavio Vella , Giulia Guidi

Analysis of Different Approaches of Parallel Block Processing for K-Means Clustering Algorithm

Distributed Computation has been a recent trend in engineering research. Parallel Computation is widely used in different areas of Data Mining, Image Processing, Simulating Models, Aerodynamics and so forth. One of the major usage of…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-03-28 C Rashmi

Fast Approximate $K$-Means via Cluster Closures

$K$-means, a simple and effective clustering algorithm, is one of the most widely used algorithms in multimedia and computer vision community. Traditional $k$-means is an iterative algorithm---in each iteration new cluster centers are…

Computer Vision and Pattern Recognition · Computer Science 2013-12-12 Jingdong Wang , Jing Wang , Qifa Ke , Gang Zeng , Shipeng Li