Related papers: Sets Clustering

k-Means Clustering of Lines for Big Data

The input to the $k$-median for lines problem is a set $L$ of $n$ lines in $\mathbb{R}^d$, and the goal is to compute a set of $k$ centers (points) in $\mathbb{R}^d$ that minimizes the sum of squared distances over every line in $L$ and its…

Computational Geometry · Computer Science 2019-11-26 Yair Marom , Dan Feldman

Coresets for $k$-Means and $k$-Median Clustering and their Applications

$\renewcommand{\Re}{{\rm I\!\hspace{-0.025em} R}} \newcommand{\eps}{{\varepsilon}} \newcommand{\Coreset}{{\mathcal{S}}} $ In this paper, we show the existence of small coresets for the problems of computing $k$-median and $k$-means…

Computational Geometry · Computer Science 2018-10-31 Sariel Har-Peled , Soham Mazumdar

A New Coreset Framework for Clustering

Given a metric space, the $(k,z)$-clustering problem consists of finding $k$ centers such that the sum of the of distances raised to the power $z$ of every point to its closest center is minimized. This encapsulates the famous $k$-median…

Data Structures and Algorithms · Computer Science 2022-08-01 Vincent Cohen-Addad , David Saulpic , Chris Schwiegelshohn

Linear time small coresets for k-mean clustering of segments with applications

We study the $k$-means problem for a set $\mathcal{S} \subseteq \mathbb{R}^d$ of $n$ segments, aiming to find $k$ centers $X \subseteq \mathbb{R}^d$ that minimize $D(\mathcal{S},X) := \sum_{S \in \mathcal{S}} \min_{x \in X} D(S,x)$, where…

Machine Learning · Computer Science 2025-11-21 David Denisov , Shlomi Dolev , Dan Felmdan , Michael Segal

A Simple PTAS for Weighted $k$-means and Sensor Coverage

Clustering is a fundamental technique in data analysis, with the $k$-means being one of the widely studied objectives due to its simplicity and broad applicability. In many practical scenarios, data points come with associated weights that…

Data Structures and Algorithms · Computer Science 2025-08-11 Akash Pareek , Supratim Shit

Almost-Optimal Upper and Lower Bounds for Clustering in Low Dimensional Euclidean Spaces

The $k$-median and $k$-means clustering objectives are classic objectives for modeling clustering in a metric space. Given a set of points in a metric space, the goal of the $k$-median (resp. $k$-means) problem is to find $k$ representative…

Computational Geometry · Computer Science 2026-03-11 Vincent Cohen-Addad , Karthik C. S. , David Saulpic , Chris Schwiegelshohn

Introduction to Coresets: Approximated Mean

A \emph{strong coreset} for the mean queries of a set $P$ in ${\mathbb{R}}^d$ is a small weighted subset $C\subseteq P$, which provably approximates its sum of squared distances to any center (point) $x\in {\mathbb{R}}^d$. A \emph{weak…

Machine Learning · Computer Science 2021-11-05 Alaa Maalouf , Ibrahim Jubran , Dan Feldman

Clustering with Set Outliers and Applications in Relational Clustering

We introduce and study the $k$-center clustering problem with set outliers, a natural and practical generalization of the classical $k$-center clustering with outliers. Instead of removing individual data points, our model allows discarding…

Data Structures and Algorithms · Computer Science 2025-12-23 Vaishali Surianarayanan , Neeraj Kumar , Stavros Sintos

Range-Clustering Queries

In a geometric $k$-clustering problem the goal is to partition a set of points in $\mathbb{R}^d$ into $k$ subsets such that a certain cost function of the clustering is minimized. We present data structures for orthogonal range-clustering…

Computational Geometry · Computer Science 2017-05-18 Mikkel Abrahamsen , Mark de Berg , Kevin Buchin , Mehran Mehr , Ali D. Mehrabi

On Variants of k-means Clustering

\textit{Clustering problems} often arise in the fields like data mining, machine learning etc. to group a collection of objects into similar groups with respect to a similarity (or dissimilarity) measure. Among the clustering problems,…

Computational Geometry · Computer Science 2015-12-10 Sayan Bandyapadhyay , Kasturi Varadarajan

Coresets for Clustering in Euclidean Spaces: Importance Sampling is Nearly Optimal

Given a collection of $n$ points in $\mathbb{R}^d$, the goal of the $(k,z)$-clustering problem is to find a subset of $k$ "centers" that minimizes the sum of the $z$-th powers of the Euclidean distance of each point to the closest center.…

Computational Geometry · Computer Science 2020-05-15 Lingxiao Huang , Nisheeth K. Vishnoi

A simple D^2-sampling based PTAS for k-means and other Clustering Problems

Given a set of points $P \subset \mathbb{R}^d$, the $k$-means clustering problem is to find a set of $k$ {\em centers} $C = \{c_1,...,c_k\}, c_i \in \mathbb{R}^d,$ such that the objective function $\sum_{x \in P} d(x,C)^2$, where $d(x,C)$…

Data Structures and Algorithms · Computer Science 2012-01-23 Ragesh Jaiswal , Amit Kumar , Sandeep Sen

Introduction to Core-sets: an Updated Survey

In optimization or machine learning problems we are given a set of items, usually points in some metric space, and the goal is to minimize or maximize an objective function over some space of candidate solutions. For example, in clustering…

Machine Learning · Computer Science 2020-11-19 Dan Feldman

Clustering with Neighborhoods

In the standard planar $k$-center clustering problem, one is given a set $P$ of $n$ points in the plane, and the goal is to select $k$ center points, so as to minimize the maximum distance over points in $P$ to their nearest center. Here we…

Computational Geometry · Computer Science 2021-09-29 Hongyao Huang , Georgiy Klimenko , Benjamin Raichel

Faster Algorithms for the Constrained k-means Problem

The classical center based clustering problems such as $k$-means/median/center assume that the optimal clusters satisfy the locality property that the points in the same cluster are close to each other. A number of clustering problems arise…

Data Structures and Algorithms · Computer Science 2015-04-13 Anup Bhattacharya , Ragesh Jaiswal , Amit Kumar

Near-Optimal Quantum Coreset Construction Algorithms for Clustering

$k$-Clustering in $\mathbb{R}^d$ (e.g., $k$-median and $k$-means) is a fundamental machine learning problem. While near-linear time approximation algorithms were known in the classical setting for a dataset with cardinality $n$, it remains…

Quantum Physics · Physics 2023-06-06 Yecheng Xue , Xiaoyu Chen , Tongyang Li , Shaofeng H. -C. Jiang

Clustering in Varying Metrics

We introduce the aggregated clustering problem, where one is given $T$ instances of a center-based clustering task over the same $n$ points, but under different metrics. The goal is to open $k$ centers to minimize an aggregate of the…

Data Structures and Algorithms · Computer Science 2025-10-10 Deeparnab Chakrabarty , Jonathan Conroy , Ankita Sarkar

On Euclidean $k$-Means Clustering with $\alpha$-Center Proximity

$k$-means clustering is NP-hard in the worst case but previous work has shown efficient algorithms assuming the optimal $k$-means clusters are \emph{stable} under additive or multiplicative perturbation of data. This has two caveats. First,…

Data Structures and Algorithms · Computer Science 2019-02-27 Amit Deshpande , Anand Louis , Apoorv Vikram Singh

A Unified Framework for Approximating and Clustering Data

Given a set $F$ of $n$ positive functions over a ground set $X$, we consider the problem of computing $x^*$ that minimizes the expression $\sum_{f\in F}f(x)$, over $x\in X$. A typical application is \emph{shape fitting}, where we wish to…

Machine Learning · Computer Science 2016-05-31 Dan Feldman , Michael Langberg

Exact Exponential Algorithms for Clustering Problems

In this paper we initiate a systematic study of exact algorithms for well-known clustering problems, namely $k$-Median and $k$-Means. In $k$-Median, the input consists of a set $X$ of $n$ points belonging to a metric space, and the task is…

Data Structures and Algorithms · Computer Science 2022-08-16 Fedor V. Fomin , Petr A. Golovach , Tanmay Inamdar , Nidhi Purohit , Saket Saurabh