Related papers: Regularization and Optimization in Model-Based Clu…

Comparative Analysis of Optimization Strategies for K-means Clustering in Big Data Contexts: A Review

This paper presents a comparative analysis of different optimization techniques for the K-means algorithm in the context of big data. K-means is a widely used clustering algorithm, but it can suffer from scalability issues when dealing with…

Machine Learning · Computer Science 2024-05-21 Ravil Mussabayev , Rustam Mussabayev

Clustering Semi-Random Mixtures of Gaussians

Gaussian mixture models (GMM) are the most widely used statistical model for the $k$-means clustering problem and form a popular framework for clustering in machine learning and data analysis. In this paper, we propose a natural semi-random…

Data Structures and Algorithms · Computer Science 2017-11-27 Pranjal Awasthi , Aravindan Vijayaraghavan

Regularized EM algorithm

Expectation-Maximization (EM) algorithm is a widely used iterative algorithm for computing (local) maximum likelihood estimate (MLE). It can be used in an extensive range of problems, including the clustering of data based on the Gaussian…

Machine Learning · Statistics 2023-03-28 Pierre Houdouin , Esa Ollila , Frederic Pascal

Scalable Clustering: Large Scale Unsupervised Learning of Gaussian Mixture Models with Outliers

Clustering is a widely used technique with a long and rich history in a variety of areas. However, most existing algorithms do not scale well to large datasets, or are missing theoretical guarantees of convergence. This paper introduces a…

Machine Learning · Statistics 2024-10-16 Yijia Zhou , Kyle A. Gallivan , Adrian Barbu

$K$-Means and Gaussian Mixture Modeling with a Separation Constraint

We consider the problem of clustering with $K$-means and Gaussian mixture models with a constraint on the separation between the centers in the context of real-valued data. We first propose a dynamic programming approach to solving the…

Computation · Statistics 2023-01-24 He Jiang , Ery Arias-Castro

How to Use K-means for Big Data Clustering?

K-means plays a vital role in data mining and is the simplest and most widely used algorithm under the Euclidean Minimum Sum-of-Squares Clustering (MSSC) model. However, its performance drastically drops when applied to vast amounts of…

Machine Learning · Computer Science 2023-11-27 Rustam Mussabayev , Nenad Mladenovic , Bassem Jarboui , Ravil Mussabayev

Number of Clusters in a Dataset: A Regularized K-means Approach

Finding the number of meaningful clusters in an unlabeled dataset is important in many applications. Regularized k-means algorithm is a possible approach frequently used to find the correct number of distinct clusters in datasets. The most…

Machine Learning · Computer Science 2025-05-30 Behzad Kamgar-Parsi , Behrooz Kamgar-Parsi

An Observation on Lloyd's k-Means Algorithm in High Dimensions

Clustering and estimating cluster means are core problems in statistics and machine learning, with k-means and Expectation Maximization (EM) being two widely used algorithms. In this work, we provide a theoretical explanation for the…

Machine Learning · Statistics 2025-06-19 David Silva-Sánchez , Roy R. Lederman

Using MM principles to deal with incomplete data in K-means clustering

Among many clustering algorithms, the K-means clustering algorithm is widely used because of its simple algorithm and fast convergence. However, this algorithm suffers from incomplete data, where some samples have missed some of their…

Machine Learning · Computer Science 2022-12-26 Ali Beikmohammadi

Comparison of Clustering Algorithms for Statistical Features of Vibration Data Sets

Vibration-based condition monitoring systems are receiving increasing attention due to their ability to accurately identify different conditions by capturing dynamic features over a broad frequency range. However, there is little research…

Machine Learning · Computer Science 2023-05-12 Philipp Sepin , Jana Kemnitz , Safoura Rezapour Lakani , Daniel Schall

Learning Mixtures of Gaussians using the k-means Algorithm

One of the most popular algorithms for clustering in Euclidean space is the $k$-means algorithm; $k$-means is difficult to analyze mathematically, and few theoretical guarantees are known about it, particularly when the data is {\em…

Machine Learning · Computer Science 2009-12-02 Kamalika Chaudhuri , Sanjoy Dasgupta , Andrea Vattani

The Informativeness of K -Means for Learning Mixture Models

The learning of mixture models can be viewed as a clustering problem. Indeed, given data samples independently generated from a mixture of distributions, we often would like to find the {\it correct target clustering} of the samples…

Machine Learning · Statistics 2022-08-26 Zhaoqiang Liu , Vincent Y. F. Tan

Unsupervised Learning of GMM with a Uniform Background Component

Gaussian Mixture Models are one of the most studied and mature models in unsupervised learning. However, outliers are often present in the data and could influence the cluster estimation. In this paper, we study a new model that assumes…

Machine Learning · Statistics 2020-03-24 Sida Liu , Adrian Barbu

Multi-Prototypes Convex Merging Based K-Means Clustering Algorithm

K-Means algorithm is a popular clustering method. However, it has two limitations: 1) it gets stuck easily in spurious local minima, and 2) the number of clusters k has to be given a priori. To solve these two issues, a multi-prototypes…

Machine Learning · Computer Science 2023-02-15 Dong Li , Shuisheng Zhou , Tieyong Zeng , Raymond H. Chan

Quantum Expectation-Maximization Algorithm

Clustering algorithms are a cornerstone of machine learning applications. Recently, a quantum algorithm for clustering based on the k-means algorithm has been proposed by Kerenidis, Landman, Luongo and Prakash. Based on their work, we…

Quantum Physics · Physics 2020-01-23 Hideyuki Miyahara , Kazuyuki Aihara , Wolfgang Lechner

Robust Clustering Using Outlier-Sparsity Regularization

Notwithstanding the popularity of conventional clustering algorithms such as K-means and probabilistic clustering, their clustering results are sensitive to the presence of outliers in the data. Even a few outliers can compromise the…

Machine Learning · Statistics 2015-05-27 Pedro A. Forero , Vassilis Kekatos , Georgios B. Giannakis

A simulation study of cluster search algorithms in data set generated by Gaussian mixture models

Determining the number of clusters is a fundamental issue in data clustering. Several algorithms have been proposed, including centroid-based algorithms using the Euclidean distance and model-based algorithms using a mixture of probability…

Machine Learning · Computer Science 2024-07-30 Ryosuke Motegi , Yoichi Seki

Probabilistic Partitive Partitioning (PPP)

Clustering is a NP-hard problem. Thus, no optimal algorithm exists, heuristics are applied to cluster the data. Heuristics can be very resource-intensive, if not applied properly. For substantially large data sets computational efficiencies…

Databases · Computer Science 2020-03-11 Mujahid Sultan

A Binary Optimization Approach for Constrained K-Means Clustering

K-Means clustering still plays an important role in many computer vision problems. While the conventional Lloyd method, which alternates between centroid update and cluster assignment, is primarily used in practice, it may converge to a…

Computer Vision and Pattern Recognition · Computer Science 2018-10-30 Huu Le , Anders Eriksson , Thanh-Toan Do , Michael Milford

Unsupervised Selective Manifold Regularized Matrix Factorization

Manifold regularization methods for matrix factorization rely on the cluster assumption, whereby the neighborhood structure of data in the input space is preserved in the factorization space. We argue that using the k-neighborhoods of all…

Machine Learning · Computer Science 2020-10-21 Priya Mani , Carlotta Domeniconi , Igor Griva