Related papers: Label-consistent clustering for evolving data
Designing efficient, effective, and consistent metric clustering algorithms is a significant challenge attracting growing attention. Traditional approaches focus on the stability of cluster centers; unfortunately, this neglects the…
The problem of constrained $k$-center clustering has attracted significant attention in the past decades. In this paper, we study balanced $k$-center cluster where the size of each cluster is constrained by the given lower and upper bounds.…
The problem of constrained clustering has attracted significant attention in the past decades. In this paper, we study the balanced $k$-center, $k$-median, and $k$-means clustering problems where the size of each cluster is constrained by…
We study the consistent k-center clustering problem. In this problem, the goal is to maintain a constant factor approximate $k$-center solution during a sequence of $n$ point insertions and deletions while minimizing the recourse, i.e., the…
Clustering is one of the most fundamental problems in unsupervised learning with a large number of applications. However, classical clustering algorithms assume that the data is static, thus failing to capture many real-world applications…
We study two generalizations of classic clustering problems called dynamic ordered $k$-median and dynamic $k$-supplier, where the points that need clustering evolve over time, and we are allowed to move the cluster centers between…
The $k$-center problem is a fundamental clustering variant with applications in learning systems and data summarization. In several real-world scenarios, the dataset to be clustered is not static, but evolves over time, as new data points…
In this paper, we investigate the learning-augmented $k$-median clustering problem, which aims to improve the performance of traditional clustering algorithms by preprocessing the point set with a predictor of error rate $\alpha \in [0,1)$.…
Given points from an arbitrary metric space and a sequence of point updates sent by an adversary, what is the minimum recourse per update (i.e., the minimum number of changes needed to the set of centers after an update), in order to…
We consider the problem of clustering in the learning-augmented setting, where we are given a data set in $d$-dimensional Euclidean space, and a label for each data point given by an oracle indicating what subsets of points should be…
Center-based clustering techniques are fundamental in some areas of machine learning such as data summarization. Generic $k$-center algorithms can produce biased cluster representatives so there has been a recent interest in fair $k$-center…
Given a stream of points in a metric space, is it possible to maintain a constant approximate clustering by changing the cluster centers only a small number of times during the entire execution of the algorithm? This question received…
$K$-means, a simple and effective clustering algorithm, is one of the most widely used algorithms in multimedia and computer vision community. Traditional $k$-means is an iterative algorithm---in each iteration new cluster centers are…
In discrete k-center and k-median clustering, we are given a set of points P in a metric space M, and the task is to output a set C \subseteq ? P, |C| = k, such that the cost of clustering P using C is as small as possible. For k-center,…
We study k-median clustering under the sequential no-substitution setting. In this setting, a data stream is sequentially observed, and some of the points are selected by the algorithm as cluster centers. However, a point can be selected as…
Supervised classification can be effective for prediction but sometimes weak on interpretability or explainability (XAI). Clustering, on the other hand, tends to isolate categories or profiles that can be meaningful but there is no…
The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. In this sense, cluster analysis algorithms are a key element of exploratory data analysis, due to their easiness in the…
We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…
In data summarization we want to choose $k$ prototypes in order to summarize a data set. We study a setting where the data set comprises several demographic groups and we are restricted to choose $k_i$ prototypes belonging to group $i$. A…
One key use of k-means clustering is to identify cluster prototypes which can serve as representative points for a dataset. However, a drawback of using k-means cluster centers as representative points is that such points distort the…