Related papers: A H-K Clustering Algorithm For High Dimensional Da…

Faster Balanced Clusterings in High Dimension

The problem of constrained clustering has attracted significant attention in the past decades. In this paper, we study the balanced $k$-center, $k$-median, and $k$-means clustering problems where the size of each cluster is constrained by…

Computational Geometry · Computer Science 2018-09-11 Hu Ding

How to Use K-means for Big Data Clustering?

K-means plays a vital role in data mining and is the simplest and most widely used algorithm under the Euclidean Minimum Sum-of-Squares Clustering (MSSC) model. However, its performance drastically drops when applied to vast amounts of…

Machine Learning · Computer Science 2023-11-27 Rustam Mussabayev , Nenad Mladenovic , Bassem Jarboui , Ravil Mussabayev

Sample-and-Search: An Effective Algorithm for Learning-Augmented k-Median Clustering in High dimensions

In this paper, we investigate the learning-augmented $k$-median clustering problem, which aims to improve the performance of traditional clustering algorithms by preprocessing the point set with a predictor of error rate $\alpha \in [0,1)$.…

Data Structures and Algorithms · Computer Science 2026-03-12 Kangke Cheng , Shihong Song , Guanlin Mo , Hu Ding

High-Dimensional Data Clustering

Clustering in high-dimensional spaces is a difficult problem which is recurrent in many domains, for example in image analysis. The difficulty is due to the fact that high-dimensional data usually live in different low-dimensional subspaces…

Statistics Theory · Mathematics 2016-08-16 Charles Bouveyron , Stéphane Girard , Cordelia Schmid

Dimensionality Reduction for $k$-means Clustering

We present a study on how to effectively reduce the dimensions of the $k$-means clustering problem, so that provably accurate approximations are obtained. Four algorithms are presented, two \textit{feature selection} and two \textit{feature…

Machine Learning · Computer Science 2020-07-28 Neophytos Charalambides

High-dimensional cluster analysis with the Masked EM Algorithm

Cluster analysis faces two problems in high dimensions: first, the `curse of dimensionality' that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large…

Quantitative Methods · Quantitative Biology 2013-09-12 Shabnam N. Kadir , Dan F. M. Goodman , Kenneth D. Harris

A sampling-based approach for efficient clustering in large datasets

We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…

Machine Learning · Computer Science 2022-03-30 Georgios Exarchakis , Omar Oubari , Gregor Lenz

Simultaneous Estimation of Number of Clusters and Feature Sparsity in Clustering High-Dimensional Data

Estimating the number of clusters (K) is a critical and often difficult task in cluster analysis. Many methods have been proposed to estimate K, including some top performers using resampling approach. When performing cluster analysis in…

Methodology · Statistics 2019-09-05 Yujia Li , Xiangrui Zeng , Chien-Wei Lin , George Tseng

Probabilistic Partitive Partitioning (PPP)

Clustering is a NP-hard problem. Thus, no optimal algorithm exists, heuristics are applied to cluster the data. Heuristics can be very resource-intensive, if not applied properly. For substantially large data sets computational efficiencies…

Databases · Computer Science 2020-03-11 Mujahid Sultan

Clustering ensemble algorithm with high-order consistency learning

Most of the research on clustering ensemble focuses on designing practical consistency learning algorithms.To solve the problems that the quality of base clusters varies and the low-quality base clusters have an impact on the performance of…

Machine Learning · Computer Science 2024-11-04 Jianwen Gan , Yan Chen , Peng Zhou , Liang Du

A cutting plane algorithm for globally solving low dimensional k-means clustering problems

Clustering is one of the most fundamental tools in data science and machine learning, and k-means clustering is one of the most common such methods. There is a variety of approximate algorithms for the k-means problem, but computing the…

Optimization and Control · Mathematics 2024-02-22 Martin Ryner , Jan Kronqvist , Johan Karlsson

Subspace Clustering through Sub-Clusters

The problem of dimension reduction is of increasing importance in modern data analysis. In this paper, we consider modeling the collection of points in a high dimensional space as a union of low dimensional subspaces. In particular we…

Machine Learning · Statistics 2020-06-12 Weiwei Li , Jan Hannig , Sayan Mukherjee

Robust Clustering on High-Dimensional Data with Stochastic Quantization

This paper addresses the limitations of conventional vector quantization algorithms, particularly K-Means and its variant K-Means++, and investigates the Stochastic Quantization (SQ) algorithm as a scalable alternative for high-dimensional…

Machine Learning · Computer Science 2025-03-11 Anton Kozyriev , Vladimir Norkin

Robust Clustering using Hyperdimensional Computing

This paper addresses the clustering of data in the hyperdimensional computing (HDC) domain. In prior work, an HDC-based clustering framework, referred to as HDCluster, has been proposed. However, the performance of the existing HDCluster is…

Machine Learning · Computer Science 2024-04-19 Lulu Ge , Keshab K. Parhi

A Computational Approach to Improving Fairness in K-means Clustering

The popular K-means clustering algorithm potentially suffers from a major weakness for further analysis or interpretation. Some cluster may have disproportionately more (or fewer) points from one of the subpopulations in terms of some…

Machine Learning · Computer Science 2026-02-10 Guancheng Zhou , Haiping Xu , Hongkang Xu , Chenyu Li , Donghui Yan

Clustering of Big Data with Mixed Features

Clustering large, mixed data is a central problem in data mining. Many approaches adopt the idea of k-means, and hence are sensitive to initialisation, detect only spherical clusters, and require a priori the unknown number of clusters. We…

Machine Learning · Statistics 2020-11-13 Joshua Tobin , Mimi Zhang

Comparative Analysis of Optimization Strategies for K-means Clustering in Big Data Contexts: A Review

This paper presents a comparative analysis of different optimization techniques for the K-means algorithm in the context of big data. K-means is a widely used clustering algorithm, but it can suffer from scalability issues when dealing with…

Machine Learning · Computer Science 2024-05-21 Ravil Mussabayev , Rustam Mussabayev

Memory Enriched Big Bang Big Crunch Optimization Algorithm for Data Clustering

Cluster analysis plays an important role in decision making process for many knowledge-based systems. There exist a wide variety of different approaches for clustering applications including the heuristic techniques, probabilistic models,…

Artificial Intelligence · Computer Science 2017-03-09 Kayvan Bijari , Hadi Zare , Hadi Veisi , Hossein Bobarshad

A hybrid clustering algorithm for data mining

Data clustering is a process of arranging similar data into groups. A clustering algorithm partitions a data set into several groups such that the similarity within a group is better than among groups. In this paper a hybrid clustering…

Databases · Computer Science 2012-05-25 Ravindra Jain

An efficient K -means clustering algorithm for massive data

The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. In this sense, cluster analysis algorithms are a key element of exploratory data analysis, due to their easiness in the…

Machine Learning · Statistics 2018-01-10 Marco Capó , Aritz Pérez , Jose A. Lozano