Related papers: Initialization Free Graph Based Clustering

A new Initial Centroid finding Method based on Dissimilarity Tree for K-means Algorithm

Cluster analysis is one of the primary data analysis technique in data mining and K-means is one of the commonly used partitioning clustering algorithm. In K-means algorithm, resulting set of clusters depend on the choice of initial…

Machine Learning · Computer Science 2015-09-11 Abhishek Kumar , Suresh Chandra Gupta

Determining Optimal Number of k-Clusters based on Predefined Level-of-Similarity

This paper proposes a centroid-based clustering algorithm which is capable of clustering data-points with n-features, without having to specify the number of clusters to be formed. The core logic behind the algorithm is a similarity…

Machine Learning · Computer Science 2020-10-08 Rabindra Lamsal , Shubham Katiyar

A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm

K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization…

Machine Learning · Computer Science 2012-09-11 M. Emre Celebi , Hassan A. Kingravi , Patricio A. Vela

A Polynomial Algorithm for Balanced Clustering via Graph Partitioning

The objective of clustering is to discover natural groups in datasets and to identify geometrical structures which might reside there, without assuming any prior knowledge on the characteristics of the data. The problem can be seen as…

Computational Geometry · Computer Science 2018-01-26 Luis-Evaristo Caraballo , José-Miguel Díaz-Báñez , Nadine Kroher

A Parameter-free Affinity Based Clustering

Several methods have been proposed to estimate the number of clusters in a dataset; the basic ideal behind all of them has been to study an index that measures inter-cluster separation and intra-cluster cohesion over a range of cluster…

Computer Vision and Pattern Recognition · Computer Science 2016-01-12 Bhaskar Mukhoty , Ruchir Gupta , Y. N. Singh

Clustering Plotted Data by Image Segmentation

Clustering algorithms are one of the main analytical methods to detect patterns in unlabeled data. Existing clustering methods typically treat samples in a dataset as points in a metric space and compute distances to group together similar…

Machine Learning · Computer Science 2021-10-12 Tarek Naous , Srinjay Sarkar , Abubakar Abid , James Zou

Spontaneous Clustering via Minimum \gamma-divergence

We propose a new method for clustering based on the local minimization of the \gamma-divergence, which we call the spontaneous clustering. The greatest advantage of the proposed method is that it automatically detects the number of clusters…

Methodology · Statistics 2013-05-01 Akifumi Notsu , Osamu Komori , Shinto Eguchi

Scalable Initialization Methods for Large-Scale Clustering

In this work, two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means|| type of an initialization strategy. The second proposal also utilizes…

Machine Learning · Computer Science 2020-07-24 Joonas Hämäläinen , Tommi Kärkkäinen , Tuomo Rossi

A provable initialization and robust clustering method for general mixture models

Clustering is a fundamental tool in statistical machine learning in the presence of heterogeneous data. Most recent results focus primarily on optimal mislabeling guarantees when data are distributed around centroids with sub-Gaussian…

Statistics Theory · Mathematics 2024-10-24 Soham Jana , Jianqing Fan , Sanjeev Kulkarni

K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect the Number of Clusters

This paper introduces k-splits, an improved hierarchical algorithm based on k-means to cluster data without prior knowledge of the number of clusters. K-splits starts from a small number of clusters and uses the most significant data…

Computer Vision and Pattern Recognition · Computer Science 2022-05-25 Seyed Omid Mohammadi , Ahmad Kalhor , Hossein Bodaghi

A Random Finite Set Model for Data Clustering

The goal of data clustering is to partition data points into groups to minimize a given objective function. While most existing clustering algorithms treat each data point as vector, in many applications each datum is not a vector but a…

Machine Learning · Statistics 2017-03-16 Dinh Phung , Ba-Ngu Bo

Multiple Independent Subspace Clusterings

Multiple clustering aims at discovering diverse ways of organizing data into clusters. Despite the progress made, it's still a challenge for users to analyze and understand the distinctive structure of each output clustering. To ease this…

Machine Learning · Computer Science 2019-07-29 Xing Wang , Jun Wang , Carlotta Domeniconi , Guoxian Yu , Guoqiang Xiao , Maozu Guo

A Novel Algorithm for Informative Meta Similarity Clusters Using Minimum Spanning Tree

The minimum spanning tree clustering algorithm is capable of detecting clusters with irregular boundaries. In this paper we propose two minimum spanning trees based clustering algorithm. The first algorithm produces k clusters with center…

Other Computer Science · Computer Science 2010-05-26 S. John Peter , S. P. Victor

A Complex Networks Approach for Data Clustering

Many methods have been developed for data clustering, such as k-means, expectation maximization and algorithms based on graph theory. In this latter case, graphs are generally constructed by taking into account the Euclidian distance as a…

Data Analysis, Statistics and Probability · Physics 2011-01-27 Francisco A. Rodrigues , Guilherme Ferraz de Arruda , Luciano da Fontoura Costa

Data-Driven Clustering via Parameterized Lloyd's Families

Algorithms for clustering points in metric spaces is a long-studied area of research. Clustering has seen a multitude of work both theoretically, in understanding the approximation guarantees possible for many objective functions such as…

Data Structures and Algorithms · Computer Science 2019-05-27 Maria-Florina Balcan , Travis Dick , Colin White

Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient

The problem of estimating the number of clusters (say k) is one of the major challenges for the partitional clustering. This paper proposes an algorithm named k-SCC to estimate the optimal k in categorical data clustering. For the…

Machine Learning · Computer Science 2025-01-28 Duy-Tai Dinh , Tsutomu Fujinami , Van-Nam Huynh

A sampling-based approach for efficient clustering in large datasets

We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…

Machine Learning · Computer Science 2022-03-30 Georgios Exarchakis , Omar Oubari , Gregor Lenz

Efficient Parameter-free Clustering Using First Neighbor Relations

We present a new clustering method in the form of a single clustering equation that is able to directly discover groupings in the data. The main proposition is that the first neighbor of each sample is all one needs to discover large chains…

Computer Vision and Pattern Recognition · Computer Science 2019-03-01 M. Saquib Sarfraz , Vivek Sharma , Rainer Stiefelhagen

DISCERN: Diversity-based Selection of Centroids for k-Estimation and Rapid Non-stochastic Clustering

One of the applications of center-based clustering algorithms such as K-Means is partitioning data points into K clusters. In some examples, the feature space relates to the underlying problem we are trying to solve, and sometimes we can…

Machine Learning · Computer Science 2020-09-23 Ali Hassani , Amir Iranmanesh , Mahdi Eftekhari , Abbas Salemi

Clustering by Constructing Hyper-Planes

As a kind of basic machine learning method, clustering algorithms group data points into different categories based on their similarity or distribution. We present a clustering algorithm by finding hyper-planes to distinguish the data…

Computer Vision and Pattern Recognition · Computer Science 2020-04-28 Luhong Diao , Jinying Gao1 , Manman Deng