Related papers: Classifying variable-structures: a general framewo…

Improved Performance of Unsupervised Method by Renovated K-Means

Clustering is a separation of data into groups of similar objects. Every group called cluster consists of objects that are similar to one another and dissimilar to objects of other groups. In this paper, the K-Means algorithm is implemented…

Machine Learning · Computer Science 2013-04-03 P. Ashok , G. M Kadhar Nawaz , E. Elayaraja , V. Vadivel

Merging $K$-means with hierarchical clustering for identifying general-shaped groups

Clustering partitions a dataset such that observations placed together in a group are similar but different from those in other groups. Hierarchical and $K$-means clustering are two approaches but have different strengths and weaknesses.…

Machine Learning · Statistics 2017-12-27 Anna D. Peterson , Arka P. Ghosh , Ranjan Maitra

On the clustering of correlated random variables

In this work, the possibility of clustering correlated random variables was examined, both because of their mutual similarity and because of their similarity to the principal components. The k-means algorithm and spectral algorithms were…

Machine Learning · Computer Science 2019-09-10 Zenon Gniazdowski , Dawid Kaliszewski

Hierarchical variable clustering based on the predictive strength between random vectors

A rank-invariant clustering of variables is introduced that is based on the predictive strength between groups of variables, i.e., two groups are assigned a high similarity if the variables in the first group contain high predictive…

Methodology · Statistics 2023-12-29 Sebastian Fuchs , Yuping Wang

Generalizing k-means for an arbitrary distance matrix

The original k-means clustering method works only if the exact vectors representing the data points are known. Therefore calculating the distances from the centroids needs vector operations, since the average of abstract data points is…

Machine Learning · Computer Science 2013-03-26 Balázs Szalkai

A new distance measurement and its application in K-Means Algorithm

K-Means clustering algorithm is one of the most commonly used clustering algorithms because of its simplicity and efficiency. K-Means clustering algorithm based on Euclidean distance only pays attention to the linear distance between…

Machine Learning · Computer Science 2022-06-13 Yiqun Zhang , Houbiao Li

Normalization based K means Clustering Algorithm

K-means is an effective clustering technique used to separate similar data into groups based on initial centroids of clusters. In this paper, Normalization based K-means clustering algorithm(N-K means) is proposed. Proposed N-K means…

Machine Learning · Computer Science 2015-03-04 Deepali Virmani , Shweta Taneja , Geetika Malhotra

Combining clustering of variables and feature selection using random forests

Standard approaches to tackle high-dimensional supervised classification problem often include variable selection and dimension reduction procedures. The novel methodology proposed in this paper combines clustering of variables and feature…

Statistics Theory · Mathematics 2018-11-07 Marie Chavent , Robin Genuer , Jerome Saracco

$k$-Variance: A Clustered Notion of Variance

We introduce $k$-variance, a generalization of variance built on the machinery of random bipartite matchings. $K$-variance measures the expected cost of matching two sets of $k$ samples from a distribution to each other, capturing local…

Statistics Theory · Mathematics 2020-12-15 Justin Solomon , Kristjan Greenewald , Haikady N. Nagaraja

Clustering -- Basic concepts and methods

We review clustering as an analysis tool and the underlying concepts from an introductory perspective. What is clustering and how can clusterings be realised programmatically? How can data be represented and prepared for a clustering task?…

Machine Learning · Computer Science 2022-12-05 Jan-Oliver Felix Kapp-Joswig , Bettina G. Keller

Accelerating k-Means Clustering with Cover Trees

The k-means clustering algorithm is a popular algorithm that partitions data into k clusters. There are many improvements to accelerate the standard algorithm. Most current research employs upper and lower bounds on point-to-cluster…

Machine Learning · Computer Science 2024-10-22 Andreas Lang , Erich Schubert

Anomaly Detection and Improvement of Clusters using Enhanced K-Means Algorithm

This paper introduces a unified approach to cluster refinement and anomaly detection in datasets. We propose a novel algorithm that iteratively reduces the intra-cluster variance of N clusters until a global minimum is reached, yielding…

Machine Learning · Computer Science 2025-06-02 Vardhan Shorewala , Shivam Shorewala

Hierarchical clustering of mixed-type data based on barycentric coding

Clustering of mixed-type datasets can be a particularly challenging task as it requires taking into account the associations between variables with different level of measurement, i.e., nominal, ordinal and/or interval. In some cases,…

Methodology · Statistics 2022-04-22 Odysseas Moschidis , Angelos Markos , Theodore Chadjipadelis

Federated k-Means over Networks

We study federated clustering, where interconnected devices collaboratively cluster the data points of private local datasets. Focusing on hard clustering via the k-means principle, we formulate federated k-means as an instance of…

Machine Learning · Computer Science 2026-01-29 Xu Yang , Salvatore Rastelli , Alexander Jung

The K-modes algorithm for clustering

Many clustering algorithms exist that estimate a cluster centroid, such as K-means, K-medoids or mean-shift, but no algorithm seems to exist that clusters data by returning exactly K meaningful modes. We propose a natural definition of a…

Machine Learning · Computer Science 2013-04-25 Miguel Á. Carreira-Perpiñán , Weiran Wang

A review of mean-shift algorithms for clustering

A natural way to characterize the cluster structure of a dataset is by finding regions containing a high density of data. This can be done in a nonparametric way with a kernel density estimate, whose modes and hence clusters can be found…

Machine Learning · Computer Science 2015-03-03 Miguel Á. Carreira-Perpiñán

Robust and sparse k-means clustering for high-dimensional data

In real-world application scenarios, the identification of groups poses a significant challenge due to possibly occurring outliers and existing noise variables. Therefore, there is a need for a clustering method which is capable of…

Methodology · Statistics 2017-09-29 Sarka Brodinova , Peter Filzmoser , Thomas Ortner , Christian Breiteneder , Maia Zaharieva

The Laplacian K-modes algorithm for clustering

In addition to finding meaningful clusters, centroid-based clustering algorithms such as K-means or mean-shift should ideally find centroids that are valid patterns in the input space, representative of data in their cluster. This is…

Machine Learning · Computer Science 2014-06-17 Weiran Wang , Miguel Á. Carreira-Perpiñán

Unsupervised classification of uncertain data objects in spatial databases using computational geometry and indexing techniques

Unsupervised classification called clustering is a process of organizing objects into groups whose members are similar in some way. Clustering of uncertain data objects is a challenge in spatial data bases. In this paper we use Probability…

Databases · Computer Science 2013-12-10 Ramachandra Rao Kurada

A Comparative Agglomerative Hierarchical Clustering Method to Cluster Implemented Course

There are many clustering methods, such as hierarchical clustering method. Most of the approaches to the clustering of variables encountered in the literature are of hierarchical type. The great majority of hierarchical approaches to the…

Databases · Computer Science 2011-01-25 Rahmat Widia Sembiring , Jasni Mohamad Zain , Abdullah Embong