Related papers: Improved Approximation Algorithms for Relational C…
This paper gives a k-means approximation algorithm that is efficient in the relational algorithms model. This is an algorithm that operates directly on a relational database without performing a join to convert it to a matrix whose rows…
The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. In this sense, cluster analysis algorithms are a key element of exploratory data analysis, due to their easiness in the…
Clustering is a popular form of unsupervised learning for geometric data. Unfortunately, many clustering algorithms lead to cluster assignments that are hard to explain, partially because they depend on all the features of the data in a…
Conventional machine learning algorithms cannot be applied until a data matrix is available to process. When the data matrix needs to be obtained from a relational database via a feature extraction query, the computation cost can be…
Kernel-based clustering algorithms have the ability to capture the non-linear structure in real world data. Among various kernel-based clustering algorithms, kernel k-means has gained popularity due to its simple iterative nature and ease…
$k$-Clustering in $\mathbb{R}^d$ (e.g., $k$-median and $k$-means) is a fundamental machine learning problem. While near-linear time approximation algorithms were known in the classical setting for a dataset with cardinality $n$, it remains…
The problem of constrained clustering has attracted significant attention in the past decades. In this paper, we study the balanced $k$-center, $k$-median, and $k$-means clustering problems where the size of each cluster is constrained by…
Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis. Popular clustering algorithms such as Lloyd's algorithm and $k$-means++ can take $\Omega(ndk)$ time when clustering $n$ points in…
We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…
$K$-means, a simple and effective clustering algorithm, is one of the most widely used algorithms in multimedia and computer vision community. Traditional $k$-means is an iterative algorithm---in each iteration new cluster centers are…
Clustering is a key task in machine learning, with $k$-means being widely used for its simplicity and effectiveness. While 1D clustering is common, existing methods often fail to exploit the structure of 1D data, leading to inefficiencies.…
Due to the progressive growth of the amount of data available in a wide variety of scientific fields, it has become more difficult to ma- nipulate and analyze such information. Even though datasets have grown in size, the K-means algorithm…
The clustering problem, in its many variants, has numerous applications in operations research and computer science (e.g., in applications in bioinformatics, image processing, social network analysis, etc.). As sizes of data sets have grown…
Connected clustering denotes a family of constrained clustering problems in which we are given a distance metric and an undirected connectivity graph $G$ that can be completely unrelated to the metric. The aim is to partition the $n$…
The k-means clustering algorithm is a popular algorithm that partitions data into k clusters. There are many improvements to accelerate the standard algorithm. Most current research employs upper and lower bounds on point-to-cluster…
Clustering problems have numerous applications and are becoming more challenging as the size of the data increases. In this paper, we consider designing clustering algorithms that can be used in MapReduce, the most popular programming…
Bateni et al. has recently introduced the weak-strong distance oracle model to study clustering problems in settings with limited distance information. Given query access to the strong-oracle and weak-oracle in the weak-strong oracle model,…
In this paper, we study clustering with respect to the k-modes objective function, a natural formulation of clustering for categorical data. One of the main contributions of this paper is to establish the connection between k-modes and…
Clustering is a long-standing research problem and a fundamental tool in AI and data analysis. The traditional k-center problem, a fundamental theoretical challenge in clustering, has a best possible approximation ratio of 2, and any…
Many algorithms for approximate nearest neighbor search in high-dimensional spaces partition the data into clusters. At query time, in order to avoid exhaustive search, an index selects the few (or a single) clusters nearest to the query…