Related papers: Solving clustering as ill-posed problem: experimen…

Learning-Augmented K-Means Clustering Using Dimensional Reduction

Learning augmented is a machine learning concept built to improve the performance of a method or model, such as enhancing its ability to predict and generalize data or features, or testing the reliability of the method by introducing noise…

Machine Learning · Computer Science 2024-01-09 Issam K. O Jabari , Shofiyah , Pradiptya Kahvi S , Novi Nur Putriwijaya , Novanto Yudistira

An Approach to Variable Clustering: K-means in Transposed Data and its Relationship with Principal Component Analysis

Principal Component Analysis (PCA) and K-means constitute fundamental techniques in multivariate analysis. Although they are frequently applied independently or sequentially to cluster observations, the relationship between them, especially…

Machine Learning · Statistics 2025-12-02 Victor Saquicela , Kenneth Palacio-Baus , Mario Chifla

A clustering approach for pairwise comparison matrices

We consider clustering in group decision making where the opinions are given by pairwise comparison matrices. In particular, the k-medoids model is suggested to classify the matrices since it has a linear programming problem formulation…

Optimization and Control · Mathematics 2025-04-17 Kolos Csaba Ágoston , Sándor Bozóki , László Csató

Discriminative k-means clustering

The k-means algorithm is a partitional clustering method. Over 60 years old, it has been successfully used for a variety of problems. The popularity of k-means is in large part a consequence of its simplicity and efficiency. In this paper…

Computer Vision and Pattern Recognition · Computer Science 2013-06-11 Ognjen Arandjelovic

A random version of principal component analysis in data clustering

Principal component analysis (PCA) is a widespread technique for data analysis that relies on the covariance-correlation matrix of the analyzed data. However to properly work with high-dimensional data, PCA poses severe mathematical…

Quantitative Methods · Quantitative Biology 2018-10-18 Luigi Leonardo Palese

On the clustering of correlated random variables

In this work, the possibility of clustering correlated random variables was examined, both because of their mutual similarity and because of their similarity to the principal components. The k-means algorithm and spectral algorithms were…

Machine Learning · Computer Science 2019-09-10 Zenon Gniazdowski , Dawid Kaliszewski

Probabilistic Partitive Partitioning (PPP)

Clustering is a NP-hard problem. Thus, no optimal algorithm exists, heuristics are applied to cluster the data. Heuristics can be very resource-intensive, if not applied properly. For substantially large data sets computational efficiencies…

Databases · Computer Science 2020-03-11 Mujahid Sultan

Turning Big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering

We develop and analyze a method to reduce the size of a very large set of data points in a high dimensional Euclidean space R d to a small set of weighted points such that the result of a predetermined data analysis task on the reduced set…

Data Structures and Algorithms · Computer Science 2018-07-13 Dan Feldman , Melanie Schmidt , Christian Sohler

Clustering and Feature Selection using Sparse Principal Component Analysis

In this paper, we study the application of sparse principal component analysis (PCA) to clustering and feature selection problems. Sparse PCA seeks sparse factors, or linear combinations of the data variables, explaining a maximum amount of…

Artificial Intelligence · Computer Science 2008-10-08 Ronny Luss , Alexandre d'Aspremont

Using MM principles to deal with incomplete data in K-means clustering

Among many clustering algorithms, the K-means clustering algorithm is widely used because of its simple algorithm and fast convergence. However, this algorithm suffers from incomplete data, where some samples have missed some of their…

Machine Learning · Computer Science 2022-12-26 Ali Beikmohammadi

An Analytical Approach to Document Clustering Based on Internal Criterion Function

Fast and high quality document clustering is an important task in organizing information, search engine results obtaining from user query, enhancing web crawling and information retrieval. With the large amount of data available and with a…

Information Retrieval · Computer Science 2010-03-11 Alok Ranjan , Harish Verma , Eatesh Kandpal , Joydip Dhar

A penalized criterion for selecting the number of clusters for K-medians

Clustering is a usual unsupervised machine learning technique for grouping the data points into groups based upon similar features. We focus here on unsupervised clustering for contaminated data, i.e in the case where K-medians should be…

Statistics Theory · Mathematics 2024-02-28 Antoine Godichon-Baggioni , Sobihan Surendran

Neural Capacitated Clustering

Recent work on deep clustering has found new promising methods also for constrained clustering problems. Their typically pairwise constraints often can be used to guide the partitioning of the data. Many problems however, feature…

Machine Learning · Computer Science 2023-05-22 Jonas K. Falkner , Lars Schmidt-Thieme

A Computational Approach to Improving Fairness in K-means Clustering

The popular K-means clustering algorithm potentially suffers from a major weakness for further analysis or interpretation. Some cluster may have disproportionately more (or fewer) points from one of the subpopulations in terms of some…

Machine Learning · Computer Science 2026-02-10 Guancheng Zhou , Haiping Xu , Hongkang Xu , Chenyu Li , Donghui Yan

Quantization/clustering: when and why does k-means work?

Though mostly used as a clustering algorithm, k-means are originally designed as a quantization algorithm. Namely, it aims at providing a compression of a probability distribution with k points. Building upon [21, 33], we try to investigate…

Statistics Theory · Mathematics 2018-01-31 Clément Levrard

An efficient K -means clustering algorithm for massive data

The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. In this sense, cluster analysis algorithms are a key element of exploratory data analysis, due to their easiness in the…

Machine Learning · Statistics 2018-01-10 Marco Capó , Aritz Pérez , Jose A. Lozano

How to Use K-means for Big Data Clustering?

K-means plays a vital role in data mining and is the simplest and most widely used algorithm under the Euclidean Minimum Sum-of-Squares Clustering (MSSC) model. However, its performance drastically drops when applied to vast amounts of…

Machine Learning · Computer Science 2023-11-27 Rustam Mussabayev , Nenad Mladenovic , Bassem Jarboui , Ravil Mussabayev

Balanced k-Means and Min-Cut Clustering

Clustering is an effective technique in data mining to generate groups that are the matter of interest. Among various clustering approaches, the family of k-means algorithms and min-cut algorithms gain most popularity due to their…

Machine Learning · Computer Science 2014-11-25 Xiaojun Chang , Feiping Nie , Zhigang Ma , Yi Yang

Improved Performance of Unsupervised Method by Renovated K-Means

Clustering is a separation of data into groups of similar objects. Every group called cluster consists of objects that are similar to one another and dissimilar to objects of other groups. In this paper, the K-Means algorithm is implemented…

Machine Learning · Computer Science 2013-04-03 P. Ashok , G. M Kadhar Nawaz , E. Elayaraja , V. Vadivel

A Binary Optimization Approach for Constrained K-Means Clustering

K-Means clustering still plays an important role in many computer vision problems. While the conventional Lloyd method, which alternates between centroid update and cluster assignment, is primarily used in practice, it may converge to a…

Computer Vision and Pattern Recognition · Computer Science 2018-10-30 Huu Le , Anders Eriksson , Thanh-Toan Do , Michael Milford