Related papers: Improved Approximation Algorithms for Relational C…

Relational Algorithms for k-means Clustering

This paper gives a k-means approximation algorithm that is efficient in the relational algorithms model. This is an algorithm that operates directly on a relational database without performing a join to convert it to a matrix whose rows…

Data Structures and Algorithms · Computer Science 2021-05-24 Benjamin Moseley , Kirk Pruhs , Alireza Samadian , Yuyan Wang

An efficient K -means clustering algorithm for massive data

The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. In this sense, cluster analysis algorithms are a key element of exploratory data analysis, due to their easiness in the…

Machine Learning · Statistics 2018-01-10 Marco Capó , Aritz Pérez , Jose A. Lozano

Explainable $k$-Means and $k$-Medians Clustering

Clustering is a popular form of unsupervised learning for geometric data. Unfortunately, many clustering algorithms lead to cluster assignments that are hard to explain, partially because they depend on all the features of the data in a…

Machine Learning · Computer Science 2020-09-23 Sanjoy Dasgupta , Nave Frost , Michal Moshkovitz , Cyrus Rashtchian

Rk-means: Fast Clustering for Relational Data

Conventional machine learning algorithms cannot be applied until a data matrix is available to process. When the data matrix needs to be obtained from a relational database via a feature extraction query, the computation cost can be…

Machine Learning · Computer Science 2019-10-14 Ryan Curtin , Ben Moseley , Hung Q. Ngo , XuanLong Nguyen , Dan Olteanu , Maximilian Schleich

Scalable Kernel Clustering: Approximate Kernel k-means

Kernel-based clustering algorithms have the ability to capture the non-linear structure in real world data. Among various kernel-based clustering algorithms, kernel k-means has gained popularity due to its simple iterative nature and ease…

Computer Vision and Pattern Recognition · Computer Science 2014-02-18 Radha Chitta , Rong Jin , Timothy C. Havens , Anil K. Jain

Near-Optimal Quantum Coreset Construction Algorithms for Clustering

$k$-Clustering in $\mathbb{R}^d$ (e.g., $k$-median and $k$-means) is a fundamental machine learning problem. While near-linear time approximation algorithms were known in the classical setting for a dataset with cardinality $n$, it remains…

Quantum Physics · Physics 2023-06-06 Yecheng Xue , Xiaoyu Chen , Tongyang Li , Shaofeng H. -C. Jiang

Faster Balanced Clusterings in High Dimension

The problem of constrained clustering has attracted significant attention in the past decades. In this paper, we study the balanced $k$-center, $k$-median, and $k$-means clustering problems where the size of each cluster is constrained by…

Computational Geometry · Computer Science 2018-09-11 Hu Ding

Simple, Scalable and Effective Clustering via One-Dimensional Projections

Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis. Popular clustering algorithms such as Lloyd's algorithm and $k$-means++ can take $\Omega(ndk)$ time when clustering $n$ points in…

Machine Learning · Computer Science 2023-10-26 Moses Charikar , Monika Henzinger , Lunjia Hu , Maxmilian Vötsch , Erik Waingarten

A sampling-based approach for efficient clustering in large datasets

We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…

Machine Learning · Computer Science 2022-03-30 Georgios Exarchakis , Omar Oubari , Gregor Lenz

Fast Approximate $K$-Means via Cluster Closures

$K$-means, a simple and effective clustering algorithm, is one of the most widely used algorithms in multimedia and computer vision community. Traditional $k$-means is an iterative algorithm---in each iteration new cluster centers are…

Computer Vision and Pattern Recognition · Computer Science 2013-12-12 Jingdong Wang , Jing Wang , Qifa Ke , Gang Zeng , Shipeng Li

Log-Time K-Means Clustering for 1D Data: Novel Approaches with Proof and Implementation

Clustering is a key task in machine learning, with $k$-means being widely used for its simplicity and effectiveness. While 1D clustering is common, existing methods often fail to exploit the structure of 1D data, leading to inefficiencies.…

Data Structures and Algorithms · Computer Science 2024-12-25 Jake Hyun

An efficient K-means algorithm for Massive Data

Due to the progressive growth of the amount of data available in a wide variety of scientific fields, it has become more difficult to ma- nipulate and analyze such information. Even though datasets have grown in size, the K-means algorithm…

Machine Learning · Statistics 2016-05-11 Marco Capó , Aritz Pérez , José Antonio Lozano

Near-Optimal Clustering in the $k$-machine model

The clustering problem, in its many variants, has numerous applications in operations research and computer science (e.g., in applications in bioinformatics, image processing, social network analysis, etc.). As sizes of data sets have grown…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-24 Sayan Bandyapadhyay , Tanmay Inamdar , Shreyas Pai , Sriram V. Pemmaraju

New Algorithms and Hardness Results for Connected Clustering

Connected clustering denotes a family of constrained clustering problems in which we are given a distance metric and an undirected connectivity graph $G$ that can be completely unrelated to the metric. The aim is to partition the $n$…

Data Structures and Algorithms · Computer Science 2025-11-25 Jan Eube , Heiko Röglin

Accelerating k-Means Clustering with Cover Trees

The k-means clustering algorithm is a popular algorithm that partitions data into k clusters. There are many improvements to accelerate the standard algorithm. Most current research employs upper and lower bounds on point-to-cluster…

Machine Learning · Computer Science 2024-10-22 Andreas Lang , Erich Schubert

Fast Clustering using MapReduce

Clustering problems have numerous applications and are becoming more challenging as the size of the data increases. In this paper, we consider designing clustering algorithms that can be used in MapReduce, the most popular programming…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-09-09 Alina Ene , Sungjin Im , Benjamin Moseley

Improved Algorithms for Clustering with Noisy Distance Oracles

Bateni et al. has recently introduced the weak-strong distance oracle model to study clustering problems in settings with limited distance information. Given query access to the strong-oracle and weak-oracle in the weak-strong oracle model,…

Data Structures and Algorithms · Computer Science 2026-02-23 Pinki Pradhan , Anup Bhattacharya , Ragesh Jaiswal

Approximation Algorithms for K-Modes Clustering

In this paper, we study clustering with respect to the k-modes objective function, a natural formulation of clustering for categorical data. One of the main contributions of this paper is to establish the connection between k-modes and…

Artificial Intelligence · Computer Science 2007-05-23 Zengyou He

Approximation Algorithm for Constrained $k$-Center Clustering: A Local Search Approach

Clustering is a long-standing research problem and a fundamental tool in AI and data analysis. The traditional k-center problem, a fundamental theoretical challenge in clustering, has a best possible approximation ratio of 2, and any…

Machine Learning · Computer Science 2026-04-28 Chaoqi Jia , Longkun Guo , Kewen Liao , Zhigang Lu , Chao Chen , Jason Xue

Balancing clusters to reduce response time variability in large scale image search

Many algorithms for approximate nearest neighbor search in high-dimensional spaces partition the data into clusters. At query time, in order to avoid exhaustive search, an index selects the few (or a single) clusters nearest to the query…

Computer Vision and Pattern Recognition · Computer Science 2010-09-27 Romain Tavenard , Laurent Amsaleg , Hervé Jégou