Related papers: An Effective Evolutionary Clustering Algorithm: He…

How to Use K-means for Big Data Clustering?

K-means plays a vital role in data mining and is the simplest and most widely used algorithm under the Euclidean Minimum Sum-of-Squares Clustering (MSSC) model. However, its performance drastically drops when applied to vast amounts of…

Machine Learning · Computer Science 2023-11-27 Rustam Mussabayev , Nenad Mladenovic , Bassem Jarboui , Ravil Mussabayev

Clustering Aware Classification for Risk Prediction and Subtyping in Clinical Data

In data containing heterogeneous subpopulations, classification performance benefits from incorporating the knowledge of cluster structure in the classifier. Previous methods for such combined clustering and classification either 1) are…

Machine Learning · Computer Science 2023-01-04 Shivin Srivastava , Siddharth Bhatia , Lingxiao Huang , Lim Jun Heng , Kenji Kawaguchi , Vaibhav Rajan

An enhanced method of initial cluster center selection for K-means algorithm

Clustering is one of the widely used techniques to find out patterns from a dataset that can be applied in different applications or analyses. K-means, the most popular and simple clustering algorithm, might get trapped into local minima if…

Machine Learning · Computer Science 2022-10-19 Zillur Rahman , Md. Sabir Hossain , Mohammad Hasan , Ahmed Imteaj

A Hybrid Algorithm Based Robust Big Data Clustering for Solving Unhealthy Initialization, Dynamic Centroid Selection and Empty clustering Problems with Analysis

Big Data is a massive volume of both structured and unstructured data that is too large and it also difficult to process using traditional techniques. Clustering algorithms have developed as a powerful learning tool that can exactly analyze…

Machine Learning · Computer Science 2020-02-24 Y. A. Joarder , Mosabbir Ahmed

DISCERN: Diversity-based Selection of Centroids for k-Estimation and Rapid Non-stochastic Clustering

One of the applications of center-based clustering algorithms such as K-Means is partitioning data points into K clusters. In some examples, the feature space relates to the underlying problem we are trying to solve, and sometimes we can…

Machine Learning · Computer Science 2020-09-23 Ali Hassani , Amir Iranmanesh , Mahdi Eftekhari , Abbas Salemi

Improvement of K Mean Clustering Algorithm Based on Density

The purpose of this paper is to improve the traditional K-means algorithm. In the traditional K mean clustering algorithm, the initial clustering centers are generated randomly in the data set. It is easy to fall into the local minimum…

Machine Learning · Computer Science 2018-10-11 Su Chang , Xu Zhenzong , Gao Xuan

An efficient K-means algorithm for Massive Data

Due to the progressive growth of the amount of data available in a wide variety of scientific fields, it has become more difficult to ma- nipulate and analyze such information. Even though datasets have grown in size, the K-means algorithm…

Machine Learning · Statistics 2016-05-11 Marco Capó , Aritz Pérez , José Antonio Lozano

Influence of Swarm Intelligence in Data Clustering Mechanisms

Data mining focuses on discovering interesting, non-trivial and meaningful information from large datasets. Data clustering is one of the unsupervised and descriptive data mining task which group data based on similarity features and…

Neural and Evolutionary Computing · Computer Science 2023-05-09 Pitawelayalage Dasun Dileepa Pitawela , Gamage Upeksha Ganegoda

Comparative Analysis of Optimization Strategies for K-means Clustering in Big Data Contexts: A Review

This paper presents a comparative analysis of different optimization techniques for the K-means algorithm in the context of big data. K-means is a widely used clustering algorithm, but it can suffer from scalability issues when dealing with…

Machine Learning · Computer Science 2024-05-21 Ravil Mussabayev , Rustam Mussabayev

A sampling-based approach for efficient clustering in large datasets

We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…

Machine Learning · Computer Science 2022-03-30 Georgios Exarchakis , Omar Oubari , Gregor Lenz

Superior Parallel Big Data Clustering through Competitive Stochastic Sample Size Optimization in Big-means

This paper introduces a novel K-means clustering algorithm, an advancement on the conventional Big-means methodology. The proposed method efficiently integrates parallel processing, stochastic sampling, and competitive optimization to…

Machine Learning · Computer Science 2024-03-28 Rustam Mussabayev , Ravil Mussabayev

Too Much Information Kills Information: A Clustering Perspective

Clustering is one of the most fundamental tools in the artificial intelligence area, particularly in the pattern recognition and learning theory. In this paper, we propose a simple, but novel approach for variance-based k-clustering tasks,…

Machine Learning · Computer Science 2020-09-17 Yicheng Xu , Vincent Chau , Chenchen Wu , Yong Zhang , Vassilis Zissimopoulos , Yifei Zou

An Initial Seed Selection Algorithm for K-means Clustering of Georeferenced Data to Improve Replicability of Cluster Assignments for Mapping Application

K-means is one of the most widely used clustering algorithms in various disciplines, especially for large datasets. However the method is known to be highly sensitive to initial seed selection of cluster centers. K-means++ has been proposed…

Machine Learning · Computer Science 2016-04-19 Fouad Khan

Effective Clustering Algorithms for Gene Expression Data

Microarrays are made it possible to simultaneously monitor the expression profiles of thousands of genes under various experimental conditions. Identification of co-expressed genes and coherent patterns is the central goal in microarray or…

Computational Engineering, Finance, and Science · Computer Science 2012-01-25 T. Chandrasekhar , K. Thangavel , E. Elayaraja

Big-Data Clustering: K-Means or K-Indicators?

The K-means algorithm is arguably the most popular data clustering method, commonly applied to processed datasets in some "feature spaces", as is in spectral clustering. Highly sensitive to initializations, however, K-means encounters a…

Machine Learning · Computer Science 2019-06-04 Feiyu Chen , Yuchen Yang , Liwei Xu , Taiping Zhang , Yin Zhang

Distributed Clustering Algorithm for Spatial Data Mining

Distributed data mining techniques and mainly distributed clustering are widely used in the last decade because they deal with very large and heterogeneous datasets which cannot be gathered centrally. Current distributed clustering…

Databases · Computer Science 2018-02-02 Malika Bendechache , M-Tahar Kechadi

CS Sparse K-means: An Algorithm for Cluster-Specific Feature Selection in High-Dimensional Clustering

Feature selection is an important and challenging task in high dimensional clustering. For example, in genomics, there may only be a small number of genes that are differentially expressed, which are informative to the overall clustering…

Methodology · Statistics 2019-10-07 Xiangrui Zeng , Hongyu Zheng

Mine Blood Donors Information through Improved K-Means Clustering

The number of accidents and health diseases which are increasing at an alarming rate are resulting in a huge increase in the demand for blood. There is a necessity for the organized analysis of the blood donor database or blood banks…

Databases · Computer Science 2013-09-11 Bondu Venkateswarlu , Prof G. S. V. Prasad Raju

An efficient K -means clustering algorithm for massive data

The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. In this sense, cluster analysis algorithms are a key element of exploratory data analysis, due to their easiness in the…

Machine Learning · Statistics 2018-01-10 Marco Capó , Aritz Pérez , Jose A. Lozano

Partitioning Clustering algorithms for handling numerical and categorical data: a review

Clustering is widely used in different field such as biology, psychology, and economics. Most traditional clustering algorithms are limited to handling datasets that contain either numeric or categorical attributes. However, datasets with…

Databases · Computer Science 2019-07-03 Trupti M. Kodinariya Dr. Prashant R. Makwana