Related papers: An Effective Evolutionary Clustering Algorithm: He…
K-means plays a vital role in data mining and is the simplest and most widely used algorithm under the Euclidean Minimum Sum-of-Squares Clustering (MSSC) model. However, its performance drastically drops when applied to vast amounts of…
In data containing heterogeneous subpopulations, classification performance benefits from incorporating the knowledge of cluster structure in the classifier. Previous methods for such combined clustering and classification either 1) are…
Clustering is one of the widely used techniques to find out patterns from a dataset that can be applied in different applications or analyses. K-means, the most popular and simple clustering algorithm, might get trapped into local minima if…
Big Data is a massive volume of both structured and unstructured data that is too large and it also difficult to process using traditional techniques. Clustering algorithms have developed as a powerful learning tool that can exactly analyze…
One of the applications of center-based clustering algorithms such as K-Means is partitioning data points into K clusters. In some examples, the feature space relates to the underlying problem we are trying to solve, and sometimes we can…
The purpose of this paper is to improve the traditional K-means algorithm. In the traditional K mean clustering algorithm, the initial clustering centers are generated randomly in the data set. It is easy to fall into the local minimum…
Due to the progressive growth of the amount of data available in a wide variety of scientific fields, it has become more difficult to ma- nipulate and analyze such information. Even though datasets have grown in size, the K-means algorithm…
Data mining focuses on discovering interesting, non-trivial and meaningful information from large datasets. Data clustering is one of the unsupervised and descriptive data mining task which group data based on similarity features and…
This paper presents a comparative analysis of different optimization techniques for the K-means algorithm in the context of big data. K-means is a widely used clustering algorithm, but it can suffer from scalability issues when dealing with…
We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…
This paper introduces a novel K-means clustering algorithm, an advancement on the conventional Big-means methodology. The proposed method efficiently integrates parallel processing, stochastic sampling, and competitive optimization to…
Clustering is one of the most fundamental tools in the artificial intelligence area, particularly in the pattern recognition and learning theory. In this paper, we propose a simple, but novel approach for variance-based k-clustering tasks,…
K-means is one of the most widely used clustering algorithms in various disciplines, especially for large datasets. However the method is known to be highly sensitive to initial seed selection of cluster centers. K-means++ has been proposed…
Microarrays are made it possible to simultaneously monitor the expression profiles of thousands of genes under various experimental conditions. Identification of co-expressed genes and coherent patterns is the central goal in microarray or…
The K-means algorithm is arguably the most popular data clustering method, commonly applied to processed datasets in some "feature spaces", as is in spectral clustering. Highly sensitive to initializations, however, K-means encounters a…
Distributed data mining techniques and mainly distributed clustering are widely used in the last decade because they deal with very large and heterogeneous datasets which cannot be gathered centrally. Current distributed clustering…
Feature selection is an important and challenging task in high dimensional clustering. For example, in genomics, there may only be a small number of genes that are differentially expressed, which are informative to the overall clustering…
The number of accidents and health diseases which are increasing at an alarming rate are resulting in a huge increase in the demand for blood. There is a necessity for the organized analysis of the blood donor database or blood banks…
The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. In this sense, cluster analysis algorithms are a key element of exploratory data analysis, due to their easiness in the…
Clustering is widely used in different field such as biology, psychology, and economics. Most traditional clustering algorithms are limited to handling datasets that contain either numeric or categorical attributes. However, datasets with…