Related papers: A New Parallelization Method for K-means
The kernel $k$-means is an effective method for data clustering which extends the commonly-used $k$-means algorithm to work on a similarity matrix over complex data structures. The kernel $k$-means algorithm is however computationally very…
The K-Means clustering using LLoyd's algorithm is an iterative approach to partition the given dataset into K different clusters. The algorithm assigns each point to the cluster based on the following objective function \[\ \min…
The k-means algorithm is one of the most common clustering algorithms and widely used in data mining and pattern recognition. The increasing computational requirement of big data applications makes hardware acceleration for the k-means…
Clustering analysis has received considerable attention in spatial data mining for several years. With the rapid development of the geospatial information technologies, the size of spatial information data is growing exponentially which…
Clustering of image is one of the important steps of mining satellite images. In our experiment we have simultaneously run multiple K-means algorithms with different initial centroids and values of k in the same iteration of MapReduce jobs.…
K-means is one of the most widely used clustering algorithms in various disciplines, especially for large datasets. However the method is known to be highly sensitive to initial seed selection of cluster centers. K-means++ has been proposed…
The Incremental K-means (IKM), an improved version of K-means (KM), was introduced to improve the clustering quality of KM significantly. However, the speed of IKM is slower than KM. My thesis proposes two algorithms to speed up IKM while…
The k-means clustering algorithm is a popular algorithm that partitions data into k clusters. There are many improvements to accelerate the standard algorithm. Most current research employs upper and lower bounds on point-to-cluster…
In the era of big data, k-means clustering has been widely adopted as a basic processing tool in various contexts. However, its computational cost could be prohibitively high as the data size and the cluster number are large. It is well…
The capability of classifying and clustering a desired set of data is an essential part of building knowledge from data. However, as the size and dimensionality of input data increases, the run-time for such clustering algorithms is…
This paper introduces a novel K-means clustering algorithm, an advancement on the conventional Big-means methodology. The proposed method efficiently integrates parallel processing, stochastic sampling, and competitive optimization to…
We propose k^2-means, a new clustering method which efficiently copes with large numbers of clusters and achieves low energy solutions. k^2-means builds upon the standard k-means (Lloyd's algorithm) and combines a new strategy to accelerate…
Clustering samples according to an effective metric and/or vector space representation is a challenging unsupervised learning task with a wide spectrum of applications. Among several clustering algorithms, k-means and its kernelized version…
This paper presents a comparative analysis of different optimization techniques for the K-means algorithm in the context of big data. K-means is a widely used clustering algorithm, but it can suffer from scalability issues when dealing with…
The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. In this sense, cluster analysis algorithms are a key element of exploratory data analysis, due to their easiness in the…
K-means plays a vital role in data mining and is the simplest and most widely used algorithm under the Euclidean Minimum Sum-of-Squares Clustering (MSSC) model. However, its performance drastically drops when applied to vast amounts of…
K-means++ is an algorithm which is invented to improve the process of finding initial seeds in K-means algorithm. In this algorithm, initial seeds are chosen consecutively by a probability which is proportional to the distance to the…
Clustering is an important tool in data analysis, with K-means being popular for its simplicity and versatility. However, it cannot handle non-linearly separable clusters. Kernel K-means addresses this limitation but requires a large kernel…
Distributed Computation has been a recent trend in engineering research. Parallel Computation is widely used in different areas of Data Mining, Image Processing, Simulating Models, Aerodynamics and so forth. One of the major usage of…
$K$-means, a simple and effective clustering algorithm, is one of the most widely used algorithms in multimedia and computer vision community. Traditional $k$-means is an iterative algorithm---in each iteration new cluster centers are…