Related papers: Approximate kernel clustering
In the kernel clustering problem we are given a (large) $n\times n$ symmetric positive semidefinite matrix $A=(a_{ij})$ with $\sum_{i=1}^n\sum_{j=1}^n a_{ij}=0$ and a (small) $k\times k$ symmetric positive semidefinite matrix $B=(b_{ij})$.…
Connected clustering denotes a family of constrained clustering problems in which we are given a distance metric and an undirected connectivity graph $G$ that can be completely unrelated to the metric. The aim is to partition the $n$…
We study the $k$-center problem in the context of individual fairness. Let $P$ be a set of $n$ points in a metric space and $r_x$ be the distance between $x \in P$ and its $\lceil n/k \rceil$-th nearest neighbor. The problem asks to…
In this work, we study diversity-aware clustering problems where the data points are associated with multiple attributes resulting in intersecting groups. A clustering solution needs to ensure that the number of chosen cluster centers from…
Kernel-based clustering algorithms have the ability to capture the non-linear structure in real world data. Among various kernel-based clustering algorithms, kernel k-means has gained popularity due to its simple iterative nature and ease…
In the past few years powerful generalizations to the Euclidean k-means problem have been made, such as Bregman clustering [7], co-clustering (i.e., simultaneous clustering of rows and columns of an input matrix) [9,18], and tensor…
We introduce the aggregated clustering problem, where one is given $T$ instances of a center-based clustering task over the same $n$ points, but under different metrics. The goal is to open $k$ centers to minimize an aggregate of the…
In this paper we study constrained subspace approximation problem. Given a set of $n$ points $\{a_1,\ldots,a_n\}$ in $\mathbb{R}^d$, the goal of the {\em subspace approximation} problem is to find a $k$ dimensional subspace that best…
It is shown that every measurable partition ${A_1,..., A_k}$ of $\mathbb{R}^3$ satisfies $$\sum_{i=1}^k||\int_{A_i} xe^{-\frac12||x||_2^2}dx||_2^2\le 9\pi^2.\qquad(*)$$ Let ${P_1,P_2,P_3}$ be the partition of $\mathbb{R}^2$ into $120^\circ$…
The clustering problem, in its many variants, has numerous applications in operations research and computer science (e.g., in applications in bioinformatics, image processing, social network analysis, etc.). As sizes of data sets have grown…
The problem of constrained $k$-center clustering has attracted significant attention in the past decades. In this paper, we study balanced $k$-center cluster where the size of each cluster is constrained by the given lower and upper bounds.…
Clustering is one of the most fundamental problem in Machine Learning. Researchers in the field often require a lower bound on the size of the clusters to maintain anonymity and upper bound for the ease of analysis. Specifying an optimal…
Clustering is a central primitive in unsupervised learning, yet practice is dominated by heuristics whose outputs can be unstable and highly sensitive to representations, hyperparameters, and initialisation. Existing theoretical results are…
We study fast algorithms for computing fundamental properties of a positive semidefinite kernel matrix $K \in \mathbb{R}^{n \times n}$ corresponding to $n$ points $x_1,\ldots,x_n \in \mathbb{R}^d$. In particular, we consider estimating the…
We consider the classical $k$-means clustering problem in the setting bi-criteria approximation, in which an algoithm is allowed to output $\beta k > k$ clusters, and must produce a clustering with cost at most $\alpha$ times the to the…
In this work, we study pairwise fair clustering with $\ell \ge 2$ groups, where for every cluster $C$ and every group $i \in [\ell]$, the number of points in $C$ from group $i$ must be at most $t$ times the number of points in $C$ from any…
We study a variant of classical clustering formulations in the context of algorithmic fairness, known as diversity-aware clustering. In this variant we are given a collection of facility subsets, and a solution must contain at least a…
$k$-Clustering in $\mathbb{R}^d$ (e.g., $k$-median and $k$-means) is a fundamental machine learning problem. While near-linear time approximation algorithms were known in the classical setting for a dataset with cardinality $n$, it remains…
Ensuring fairness in machine learning algorithms is a challenging and essential task. We consider the problem of clustering a set of points while satisfying fairness constraints. While there have been several attempts to capture group…
Given a set of points, clustering consists of finding a partition of a point set into $k$ clusters such that the center to which a point is assigned is as close as possible. Most commonly, centers are points themselves, which leads to the…