Related papers: Faster Clustering via Preprocessing

Sample-and-Search: An Effective Algorithm for Learning-Augmented k-Median Clustering in High dimensions

In this paper, we investigate the learning-augmented $k$-median clustering problem, which aims to improve the performance of traditional clustering algorithms by preprocessing the point set with a predictor of error rate $\alpha \in [0,1)$.…

Data Structures and Algorithms · Computer Science 2026-03-12 Kangke Cheng , Shihong Song , Guanlin Mo , Hu Ding

Streaming k-Means Clustering with Fast Queries

We present methods for k-means clustering on a stream with a focus on providing fast responses to clustering queries. Compared to the current state-of-the-art, our methods provide substantial improvement in the query time for cluster…

Data Structures and Algorithms · Computer Science 2018-12-10 Yu Zhang , Kanat Tangwongsan , Srikanta Tirthapura

Fast k-means algorithm clustering

k-means has recently been recognized as one of the best algorithms for clustering unsupervised data. Since k-means depends mainly on distance calculation between all data points and the centers, the time cost will be high when the size of…

Data Structures and Algorithms · Computer Science 2011-08-08 Raied Salman , Vojislav Kecman , Qi Li , Robert Strack , Erik Test

A sampling-based approach for efficient clustering in large datasets

We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…

Machine Learning · Computer Science 2022-03-30 Georgios Exarchakis , Omar Oubari , Gregor Lenz

A parallel sampling based clustering

The problem of automatically clustering data is an age old problem. People have created numerous algorithms to tackle this problem. The execution time of any of this algorithm grows with the number of input points and the number of cluster…

Machine Learning · Computer Science 2014-12-08 Aditya AV Sastry , Kalyan Netti

Efficient Large Scale Clustering based on Data Partitioning

Clustering techniques are very attractive for extracting and identifying patterns in datasets. However, their application to very large spatial datasets presents numerous challenges such as high-dimensionality data, heterogeneity, and high…

Databases · Computer Science 2018-02-27 Malika Bendechache , Nhien-An Le-Khac , M-Tahar Kechadi

Clustering by Constructing Hyper-Planes

As a kind of basic machine learning method, clustering algorithms group data points into different categories based on their similarity or distribution. We present a clustering algorithm by finding hyper-planes to distinguish the data…

Computer Vision and Pattern Recognition · Computer Science 2020-04-28 Luhong Diao , Jinying Gao1 , Manman Deng

Clustering by Attention: Leveraging Prior Fitted Transformers for Data Partitioning

Clustering is a core task in machine learning with wide-ranging applications in data mining and pattern recognition. However, its unsupervised nature makes it inherently challenging. Many existing clustering algorithms suffer from critical…

Machine Learning · Computer Science 2025-07-29 Ahmed Shokry , Ayman Khalafallah

Clustering in statistical ill-posed linear inverse problems

In many statistical linear inverse problems, one needs to recover classes of similar curves from their noisy images under an operator that does not have a bounded inverse. Problems of this kind appear in many areas of application.…

Statistics Theory · Mathematics 2020-03-24 Rasika Rajapakshage , Marianna Pensky

Near-Optimal Quantum Coreset Construction Algorithms for Clustering

$k$-Clustering in $\mathbb{R}^d$ (e.g., $k$-median and $k$-means) is a fundamental machine learning problem. While near-linear time approximation algorithms were known in the classical setting for a dataset with cardinality $n$, it remains…

Quantum Physics · Physics 2023-06-06 Yecheng Xue , Xiaoyu Chen , Tongyang Li , Shaofeng H. -C. Jiang

Coresets for Kernel Clustering

We devise coresets for kernel $k$-Means with a general kernel, and use them to obtain new, more efficient, algorithms. Kernel $k$-Means has superior clustering capability compared to classical $k$-Means, particularly when clusters are…

Data Structures and Algorithms · Computer Science 2024-04-09 Shaofeng H. -C. Jiang , Robert Krauthgamer , Jianing Lou , Yubo Zhang

Settling Time vs. Accuracy Tradeoffs for Clustering Big Data

We study the theoretical and practical runtime limits of k-means and k-median clustering on large datasets. Since effectively all clustering methods are slower than the time it takes to read the dataset, the fastest approach is to quickly…

Machine Learning · Computer Science 2024-04-03 Andrew Draganov , David Saulpic , Chris Schwiegelshohn

Faster Balanced Clusterings in High Dimension

The problem of constrained clustering has attracted significant attention in the past decades. In this paper, we study the balanced $k$-center, $k$-median, and $k$-means clustering problems where the size of each cluster is constrained by…

Computational Geometry · Computer Science 2018-09-11 Hu Ding

Introduction to Core-sets: an Updated Survey

In optimization or machine learning problems we are given a set of items, usually points in some metric space, and the goal is to minimize or maximize an objective function over some space of candidate solutions. For example, in clustering…

Machine Learning · Computer Science 2020-11-19 Dan Feldman

Leveraging Union of Subspace Structure to Improve Constrained Clustering

Many clustering problems in computer vision and other contexts are also classification problems, where each cluster shares a meaningful label. Subspace clustering algorithms in particular are often applied to problems that fit this…

Machine Learning · Computer Science 2017-09-15 John Lipor , Laura Balzano

Greedy Subspace Clustering

We consider the problem of subspace clustering: given points that lie on or near the union of many low-dimensional linear subspaces, recover the subspaces. To this end, one first identifies sets of points close to the same subspace and uses…

Machine Learning · Statistics 2014-11-03 Dohyung Park , Constantine Caramanis , Sujay Sanghavi

Faster Projective Clustering Approximation of Big Data

In projective clustering we are given a set of n points in $R^d$ and wish to cluster them to a set $S$ of $k$ linear subspaces in $R^d$ according to some given distance function. An $\eps$-coreset for this problem is a weighted (scaled)…

Data Structures and Algorithms · Computer Science 2020-11-30 Adiel Statman , Liat Rozenberg , Dan Feldman

Distributed Partial Clustering

Recent years have witnessed an increasing popularity of algorithm design for distributed data, largely due to the fact that massive datasets are often collected and stored in different locations. In the distributed setting communication…

Data Structures and Algorithms · Computer Science 2017-06-06 Sudipto Guha , Yi Li , Qin Zhang

Clustering For Point Pattern Data

Clustering is one of the most common unsupervised learning tasks in machine learning and data mining. Clustering algorithms have been used in a plethora of applications across several scientific fields. However, there has been limited…

Machine Learning · Computer Science 2017-02-09 Quang N. Tran , Ba-Ngu Vo , Dinh Phung , Ba-Tuong Vo

The Utility of Clustering in Prediction Tasks

We explore the utility of clustering in reducing error in various prediction tasks. Previous work has hinted at the improvement in prediction accuracy attributed to clustering algorithms if used to pre-process the data. In this work we more…

Machine Learning · Computer Science 2015-09-22 Shubhendu Trivedi , Zachary A. Pardos , Neil T. Heffernan