Related papers: Approximation Algorithms for K-Modes Clustering

The K-modes algorithm for clustering

Many clustering algorithms exist that estimate a cluster centroid, such as K-means, K-medoids or mean-shift, but no algorithm seems to exist that clusters data by returning exactly K meaningful modes. We propose a natural definition of a…

Machine Learning · Computer Science 2013-04-25 Miguel Á. Carreira-Perpiñán , Weiran Wang

Approximation Algorithms for Clustering with Dynamic Points

We study two generalizations of classic clustering problems called dynamic ordered $k$-median and dynamic $k$-supplier, where the points that need clustering evolve over time, and we are allowed to move the cluster centers between…

Data Structures and Algorithms · Computer Science 2022-07-26 Shichuan Deng , Jian Li , Yuval Rabani

Explainable $k$-Means and $k$-Medians Clustering

Clustering is a popular form of unsupervised learning for geometric data. Unfortunately, many clustering algorithms lead to cluster assignments that are hard to explain, partially because they depend on all the features of the data in a…

Machine Learning · Computer Science 2020-09-23 Sanjoy Dasgupta , Nave Frost , Michal Moshkovitz , Cyrus Rashtchian

Optimal Time Bounds for Approximate Clustering

Clustering is a fundamental problem in unsupervised learning, and has been studied widely both as a problem of learning mixture models and as an optimization problem. In this paper, we study clustering with respect the emph{k-median}…

Data Structures and Algorithms · Computer Science 2013-01-07 Ramgopal Mettu , Greg Plaxton

Almost-Optimal Upper and Lower Bounds for Clustering in Low Dimensional Euclidean Spaces

The $k$-median and $k$-means clustering objectives are classic objectives for modeling clustering in a metric space. Given a set of points in a metric space, the goal of the $k$-median (resp. $k$-means) problem is to find $k$ representative…

Computational Geometry · Computer Science 2026-03-11 Vincent Cohen-Addad , Karthik C. S. , David Saulpic , Chris Schwiegelshohn

A sampling-based approach for efficient clustering in large datasets

We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…

Machine Learning · Computer Science 2022-03-30 Georgios Exarchakis , Omar Oubari , Gregor Lenz

On Approximability of Clustering Problems Without Candidate Centers

The k-means objective is arguably the most widely-used cost function for modeling clustering tasks in a metric space. In practice and historically, k-means is thought of in a continuous setting, namely where the centers can be located…

Computational Complexity · Computer Science 2020-10-08 Vincent Cohen-Addad , Karthik C. S. , Euiwoong Lee

An Efficient $k$-modes Algorithm for Clustering Categorical Datasets

Mining clusters from data is an important endeavor in many applications. The $k$-means method is a popular, efficient, and distribution-free approach for clustering numerical-valued data, but does not apply for categorical-valued…

Methodology · Statistics 2021-08-24 Karin S. Dorman , Ranjan Maitra

A bi-criteria approximation algorithm for $k$ Means

We consider the classical $k$-means clustering problem in the setting bi-criteria approximation, in which an algoithm is allowed to output $\beta k > k$ clusters, and must produce a clustering with cost at most $\alpha$ times the to the…

Data Structures and Algorithms · Computer Science 2015-08-04 Konstantin Makarychev , Yury Makarychev , Maxim Sviridenko , Justin Ward

Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms

Clustering is a classic topic in optimization with $k$-means being one of the most fundamental such problems. In the absence of any restrictions on the input, the best known algorithm for $k$-means with a provable guarantee is a simple…

Data Structures and Algorithms · Computer Science 2017-04-11 Sara Ahmadian , Ashkan Norouzi-Fard , Ola Svensson , Justin Ward

Improved Approximation Algorithms for Relational Clustering

Clustering plays a crucial role in computer science, facilitating data analysis and problem-solving across numerous fields. By partitioning large datasets into meaningful groups, clustering reveals hidden structures and relationships within…

Databases · Computer Science 2026-02-19 Aryan Esmailpour , Stavros Sintos

Improved Performance of Unsupervised Method by Renovated K-Means

Clustering is a separation of data into groups of similar objects. Every group called cluster consists of objects that are similar to one another and dissimilar to objects of other groups. In this paper, the K-Means algorithm is implemented…

Machine Learning · Computer Science 2013-04-03 P. Ashok , G. M Kadhar Nawaz , E. Elayaraja , V. Vadivel

On the Fixed-Parameter Tractability of Capacitated Clustering

We study the complexity of the classic capacitated k-median and k-means problems parameterized by the number of centers, k. These problems are notoriously difficult since the best known approximation bound for high dimensional Euclidean…

Data Structures and Algorithms · Computer Science 2022-08-31 Vincent Cohen-Addad , Jason Li

The Laplacian K-modes algorithm for clustering

In addition to finding meaningful clusters, centroid-based clustering algorithms such as K-means or mean-shift should ideally find centroids that are valid patterns in the input space, representative of data in their cluster. This is…

Machine Learning · Computer Science 2014-06-17 Weiran Wang , Miguel Á. Carreira-Perpiñán

Data-Driven Clustering via Parameterized Lloyd's Families

Algorithms for clustering points in metric spaces is a long-studied area of research. Clustering has seen a multitude of work both theoretically, in understanding the approximation guarantees possible for many objective functions such as…

Data Structures and Algorithms · Computer Science 2019-05-27 Maria-Florina Balcan , Travis Dick , Colin White

Hybrid k-Clustering: Blending k-Median and k-Center

We propose a novel clustering model encompassing two well-known clustering models: k-center clustering and k-median clustering. In the Hybrid k-Clusetring problem, given a set P of points in R^d, an integer k, and a non-negative real r, our…

Data Structures and Algorithms · Computer Science 2024-07-12 Fedor V. Fomin , Petr A. Golovach , Tanmay Inamdar , Saket Saurabh , Meirav Zehavi

Deterministic $k$-Median Clustering in Near-Optimal Time

The metric $k$-median problem is a textbook clustering problem. As input, we are given a metric space $V$ of size $n$ and an integer $k$, and our task is to find a subset $S \subseteq V$ of at most $k$ `centers' that minimizes the total…

Data Structures and Algorithms · Computer Science 2026-03-31 Martín Costa , Ermiya Farokhnejad

The Informativeness of K -Means for Learning Mixture Models

The learning of mixture models can be viewed as a clustering problem. Indeed, given data samples independently generated from a mixture of distributions, we often would like to find the {\it correct target clustering} of the samples…

Machine Learning · Statistics 2022-08-26 Zhaoqiang Liu , Vincent Y. F. Tan

Clustering is spotting pattern in a group of objects and resultantly grouping the similar objects together. Objects have attributes which are not always numerical, sometimes attributes have domain or categories to which they could belong…

Machine Learning · Computer Science 2020-11-20 Utkarsh Nath , Shikha Asrani , Rahul Katarya

Reconciliation k-median: Clustering with Non-Polarized Representatives

We propose a new variant of the k-median problem, where the objective function models not only the cost of assigning data points to cluster representatives, but also a penalty term for disagreement among the representatives. We motivate…

Data Structures and Algorithms · Computer Science 2021-07-29 Bruno Ordozgoiti , Aristides Gionis