Related papers: Data Clustering and Visualization with Recursive M…
In this article, we introduce a novel recursive modification to the classical Goemans-Williamson MaxCut algorithm, offering improved performance in vectorized data clustering tasks. Focusing on the clustering of medical publications, we…
We introduce a novel method for clustering using a semidefinite programming (SDP) relaxation of the Max k-Cut problem. The approach is based on a new methodology for rounding the solution of an SDP relaxation using iterated linear…
We analyze the clustering problem through a flexible probabilistic model that aims to identify an optimal partition on the sample X 1 , ..., X n. We perform exact clustering with high probability using a convex semidefinite estimator that…
Clustering large, mixed data is a central problem in data mining. Many approaches adopt the idea of k-means, and hence are sensitive to initialisation, detect only spherical clusters, and require a priori the unknown number of clusters. We…
We investigate a clustering problem with data from a mixture of Gaussians that share a common but unknown, and potentially ill-conditioned, covariance matrix. We start by considering Gaussian mixtures with two equally-sized components and…
We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…
Max-k-Cut and correlation clustering are fundamental graph partitioning problems. For a graph with G=(V,E) with n vertices, the methods with the best approximation guarantees for Max-k-Cut and the Max-Agree variant of correlation clustering…
Biclustering, also called co-clustering, block clustering, or two-way clustering, involves the simultaneous clustering of both the rows and columns of a data matrix into distinct groups, such that the rows and columns within a group display…
This work proposes a clusterization algorithm called k-Morphological Sets (k-MS), based on morphological reconstruction and heuristics. k-MS is faster than the CPU-parallel k-Means in worst case scenarios and produces enhanced…
The problem of estimating the number of clusters (say k) is one of the major challenges for the partitional clustering. This paper proposes an algorithm named k-SCC to estimate the optimal k in categorical data clustering. For the…
We suggest using the max-norm as a convex surrogate constraint for clustering. We show how this yields a better exact cluster recovery guarantee than previously suggested nuclear-norm relaxation, and study the effectiveness of our method,…
Clustering is an effective technique in data mining to generate groups that are the matter of interest. Among various clustering approaches, the family of k-means algorithms and min-cut algorithms gain most popularity due to their…
We introduce a sketch-and-solve approach to speed up the Peng-Wei semidefinite relaxation of k-means clustering. When the data is appropriately separated we identify the k-means optimal clustering. Otherwise, our approach provides a…
Clustering is a critical component of decision-making in todays data-driven environments. It has been widely used in a variety of fields such as bioinformatics, social network analysis, and image processing. However, clustering accuracy…
In many applications of X-ray computed tomography, an unsupervised segmentation of the reconstructed 3D volumes forms an important step in the image processing chain for further investigation of the digitized object. Therefore, the goal is…
Clustering algorithms are one of the main analytical methods to detect patterns in unlabeled data. Existing clustering methods typically treat samples in a dataset as points in a metric space and compute distances to group together similar…
Semi-supervised clustering aims to introduce prior knowledge in the decision process of a clustering algorithm. In this paper, we propose a novel semi-supervised clustering algorithm based on the information-maximization principle. The…
In this paper, we first propose a new iterative algorithm, called the K-sets+ algorithm for clustering data points in a semi-metric space, where the distance measure does not necessarily satisfy the triangular inequality. We show that the…
In this paper, we show that the popular K-means clustering problem can equivalently be reformulated as a conic program of polynomial size. The arising convex optimization problem is NP-hard, but amenable to a tractable semidefinite…
Semidefinite programming (SDP) is a powerful tool for tackling a wide range of computationally hard problems such as clustering. Despite the high accuracy, semidefinite programs are often too slow in practice with poor scalability on large…