Related papers: Parallel K-Medoids++ Spatial Clustering Algorithm …

Fast Clustering using MapReduce

Clustering problems have numerous applications and are becoming more challenging as the size of the data increases. In this paper, we consider designing clustering algorithms that can be used in MapReduce, the most popular programming…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-09-09 Alina Ene , Sungjin Im , Benjamin Moseley

How to Use K-means for Big Data Clustering?

K-means plays a vital role in data mining and is the simplest and most widely used algorithm under the Euclidean Minimum Sum-of-Squares Clustering (MSSC) model. However, its performance drastically drops when applied to vast amounts of…

Machine Learning · Computer Science 2023-11-27 Rustam Mussabayev , Nenad Mladenovic , Bassem Jarboui , Ravil Mussabayev

Multiple K Means++ Clustering of Satellite Image Using Hadoop MapReduce and Spark

Clustering of image is one of the important steps of mining satellite images. In our experiment we have simultaneously run multiple K-means algorithms with different initial centroids and values of k in the same iteration of MapReduce jobs.…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-05-09 Tapan Sharma , Dr. Vinod Shokeen , Dr. Sunil Mathur

Distributed Clustering Algorithm for Spatial Data Mining

Distributed data mining techniques and mainly distributed clustering are widely used in the last decade because they deal with very large and heterogeneous datasets which cannot be gathered centrally. Current distributed clustering…

Databases · Computer Science 2018-02-02 Malika Bendechache , M-Tahar Kechadi

Efficient techniques for mining spatial databases

Clustering is one of the major tasks in data mining. In the last few years, Clustering of spatial data has received a lot of research attention. Spatial databases are components of many advanced information systems like geographic…

Databases · Computer Science 2012-06-04 Mohamed A. El-Zawawy

Superior Parallel Big Data Clustering through Competitive Stochastic Sample Size Optimization in Big-means

This paper introduces a novel K-means clustering algorithm, an advancement on the conventional Big-means methodology. The proposed method efficiently integrates parallel processing, stochastic sampling, and competitive optimization to…

Machine Learning · Computer Science 2024-03-28 Rustam Mussabayev , Ravil Mussabayev

A New Parallelization Method for K-means

K-means is a popular clustering method used in data mining area. To work with large datasets, researchers propose PKMeans, which is a parallel k-means on MapReduce. However, the existing k-means parallelization methods including PKMeans…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-08-30 Shikai Jin , Yuxuan Cui , Chunli Yu

An efficient K -means clustering algorithm for massive data

The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. In this sense, cluster analysis algorithms are a key element of exploratory data analysis, due to their easiness in the…

Machine Learning · Statistics 2018-01-10 Marco Capó , Aritz Pérez , Jose A. Lozano

An Initial Seed Selection Algorithm for K-means Clustering of Georeferenced Data to Improve Replicability of Cluster Assignments for Mapping Application

K-means is one of the most widely used clustering algorithms in various disciplines, especially for large datasets. However the method is known to be highly sensitive to initial seed selection of cluster centers. K-means++ has been proposed…

Machine Learning · Computer Science 2016-04-19 Fouad Khan

Analysis of Different Approaches of Parallel Block Processing for K-Means Clustering Algorithm

Distributed Computation has been a recent trend in engineering research. Parallel Computation is widely used in different areas of Data Mining, Image Processing, Simulating Models, Aerodynamics and so forth. One of the major usage of…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-03-28 C Rashmi

Comparative Analysis of Optimization Strategies for K-means Clustering in Big Data Contexts: A Review

This paper presents a comparative analysis of different optimization techniques for the K-means algorithm in the context of big data. K-means is a widely used clustering algorithm, but it can suffer from scalability issues when dealing with…

Machine Learning · Computer Science 2024-05-21 Ravil Mussabayev , Rustam Mussabayev

Parallelization of the K-Means Algorithm with Applications to Big Data Clustering

The K-Means clustering using LLoyd's algorithm is an iterative approach to partition the given dataset into K different clusters. The algorithm assigns each point to the cluster based on the following objective function \[\ \min…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-21 Ashish Srivastava , Mohammed Nawfal

Effective Spatial Data Partitioning for Scalable Query Processing

Recently, MapReduce based spatial query systems have emerged as a cost effective and scalable solution to large scale spatial data processing and analytics. MapReduce based systems achieve massive scalability by partitioning the data and…

Databases · Computer Science 2015-09-04 Ablimit Aji , Vo Hoang , Fusheng Wang

Distributed Spatial Data Clustering as a New Approach for Big Data Analysis

In this paper we propose a new approach for Big Data mining and analysis. This new approach works well on distributed datasets and deals with data clustering task of the analysis. The approach consists of two main phases, the first phase…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-03-05 Malika Bendechache , Nhien-An Le-Khac , M-Tahar Kechadi

Probabilistic Partitive Partitioning (PPP)

Clustering is a NP-hard problem. Thus, no optimal algorithm exists, heuristics are applied to cluster the data. Heuristics can be very resource-intensive, if not applied properly. For substantially large data sets computational efficiencies…

Databases · Computer Science 2020-03-11 Mujahid Sultan

Scalable Initialization Methods for Large-Scale Clustering

In this work, two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means|| type of an initialization strategy. The second proposal also utilizes…

Machine Learning · Computer Science 2020-07-24 Joonas Hämäläinen , Tommi Kärkkäinen , Tuomo Rossi

Clustering of Big Data with Mixed Features

Clustering large, mixed data is a central problem in data mining. Many approaches adopt the idea of k-means, and hence are sensitive to initialisation, detect only spherical clusters, and require a priori the unknown number of clusters. We…

Machine Learning · Statistics 2020-11-13 Joshua Tobin , Mimi Zhang

K-sets+: a Linear-time Clustering Algorithm for Data Points with a Sparse Similarity Measure

In this paper, we first propose a new iterative algorithm, called the K-sets+ algorithm for clustering data points in a semi-metric space, where the distance measure does not necessarily satisfy the triangular inequality. We show that the…

Data Structures and Algorithms · Computer Science 2017-05-12 Cheng-Shang Chang , Chia-Tai Chang , Duan-Shin Lee , Li-Heng Liou

Document Clustering using K-Means and K-Medoids

With the huge upsurge of information in day-to-days life, it has become difficult to assemble relevant information in nick of time. But people, always are in dearth of time, they need everything quick. Hence clustering was introduced to…

Information Retrieval · Computer Science 2015-03-02 Rakesh Chandra Balabantaray , Chandrali Sarma , Monica Jha

Document Clustering using K-Medoids

People are always in search of matters for which they are prone to use internet, but again it has huge assemblage of data due to which it becomes difficult for the reader to get the most accurate data. To make it easier for people to gather…

Information Retrieval · Computer Science 2015-04-07 Monica Jha