Related papers: Data Clustering and Visualization with Recursive M…

Data Clustering and Visualization with Recursive Goemans-Williamson MaxCut Algorithm

In this article, we introduce a novel recursive modification to the classical Goemans-Williamson MaxCut algorithm, offering improved performance in vectorized data clustering tasks. Focusing on the clustering of medical publications, we…

Optimization and Control · Mathematics 2024-08-16 An Ly , Raj Sawhney , Marina Chugunova

Clustering with Semidefinite Programming and Fixed Point Iteration

We introduce a novel method for clustering using a semidefinite programming (SDP) relaxation of the Max k-Cut problem. The approach is based on a new methodology for rounding the solution of an SDP relaxation using iterated linear…

Optimization and Control · Mathematics 2022-07-07 Pedro Felzenszwalb , Caroline Klivans , Alice Paul

Adaptive Clustering through Semidefinite Programming

We analyze the clustering problem through a flexible probabilistic model that aims to identify an optimal partition on the sample X 1 , ..., X n. We perform exact clustering with high probability using a convex semidefinite estimator that…

Statistics Theory · Mathematics 2017-05-19 Martin Royer

Clustering of Big Data with Mixed Features

Clustering large, mixed data is a central problem in data mining. Many approaches adopt the idea of k-means, and hence are sensitive to initialisation, detect only spherical clusters, and require a priori the unknown number of clusters. We…

Machine Learning · Statistics 2020-11-13 Joshua Tobin , Mimi Zhang

Clustering a Mixture of Gaussians with Unknown Covariance

We investigate a clustering problem with data from a mixture of Gaussians that share a common but unknown, and potentially ill-conditioned, covariance matrix. We start by considering Gaussian mixtures with two equally-sized components and…

Machine Learning · Statistics 2021-11-30 Damek Davis , Mateo Díaz , Kaizheng Wang

A sampling-based approach for efficient clustering in large datasets

We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…

Machine Learning · Computer Science 2022-03-30 Georgios Exarchakis , Omar Oubari , Gregor Lenz

Memory-Efficient Approximation Algorithms for Max-k-Cut and Correlation Clustering

Max-k-Cut and correlation clustering are fundamental graph partitioning problems. For a graph with G=(V,E) with n vertices, the methods with the best approximation guarantees for Max-k-Cut and the Max-Agree variant of correlation clustering…

Optimization and Control · Mathematics 2021-10-28 Nimita Shinde , Vishnu Narayanan , James Saunderson

A Semidefinite Programming-Based Branch-and-Cut Algorithm for Biclustering

Biclustering, also called co-clustering, block clustering, or two-way clustering, involves the simultaneous clustering of both the rows and columns of a data matrix into distinct groups, such that the rows and columns within a group display…

Optimization and Control · Mathematics 2024-12-06 Antonio M. Sudoso

k-MS: A novel clustering algorithm based on morphological reconstruction

This work proposes a clusterization algorithm called k-Morphological Sets (k-MS), based on morphological reconstruction and heuristics. k-MS is faster than the CPU-parallel k-Means in worst case scenarios and produces enhanced…

Machine Learning · Computer Science 2022-08-31 É. O. Rodrigues , L. Torok , P. Liatsis , J. Viterbo , A. Conci

Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient

The problem of estimating the number of clusters (say k) is one of the major challenges for the partitional clustering. This paper proposes an algorithm named k-SCC to estimate the optimal k in categorical data clustering. For the…

Machine Learning · Computer Science 2025-01-28 Duy-Tai Dinh , Tsutomu Fujinami , Van-Nam Huynh

Clustering using Max-norm Constrained Optimization

We suggest using the max-norm as a convex surrogate constraint for clustering. We show how this yields a better exact cluster recovery guarantee than previously suggested nuclear-norm relaxation, and study the effectiveness of our method,…

Machine Learning · Computer Science 2012-04-16 Ali Jalali , Nathan Srebro

Balanced k-Means and Min-Cut Clustering

Clustering is an effective technique in data mining to generate groups that are the matter of interest. Among various clustering approaches, the family of k-means algorithms and min-cut algorithms gain most popularity due to their…

Machine Learning · Computer Science 2014-11-25 Xiaojun Chang , Feiping Nie , Zhigang Ma , Yi Yang

Sketch-and-solve approaches to k-means clustering by semidefinite programming

We introduce a sketch-and-solve approach to speed up the Peng-Wei semidefinite relaxation of k-means clustering. When the data is appropriately separated we identify the k-means optimal clustering. Otherwise, our approach provides a…

Machine Learning · Computer Science 2022-11-30 Charles Clum , Dustin G. Mixon , Soledad Villar , Kaiying Xie

CAS Condensed and Accelerated Silhouette: An Efficient Method for Determining the Optimal K in K-Means Clustering

Clustering is a critical component of decision-making in todays data-driven environments. It has been widely used in a variety of fields such as bioinformatics, social network analysis, and image processing. However, clustering accuracy…

Machine Learning · Computer Science 2025-07-14 Krishnendu Das , Sumit Gupta , Awadhesh Kumar

Clustering large 3D volumes: A sampling-based approach

In many applications of X-ray computed tomography, an unsupervised segmentation of the reconstructed 3D volumes forms an important step in the image processing chain for further investigation of the digitized object. Therefore, the goal is…

Computer Vision and Pattern Recognition · Computer Science 2023-03-09 Thomas Lang

Clustering Plotted Data by Image Segmentation

Clustering algorithms are one of the main analytical methods to detect patterns in unlabeled data. Existing clustering methods typically treat samples in a dataset as points in a metric space and compute distances to group together similar…

Machine Learning · Computer Science 2021-10-12 Tarek Naous , Srinjay Sarkar , Abubakar Abid , James Zou

Semi-Supervised Information-Maximization Clustering

Semi-supervised clustering aims to introduce prior knowledge in the decision process of a clustering algorithm. In this paper, we propose a novel semi-supervised clustering algorithm based on the information-maximization principle. The…

Machine Learning · Computer Science 2013-05-02 Daniele Calandriello , Gang Niu , Masashi Sugiyama

K-sets+: a Linear-time Clustering Algorithm for Data Points with a Sparse Similarity Measure

In this paper, we first propose a new iterative algorithm, called the K-sets+ algorithm for clustering data points in a semi-metric space, where the distance measure does not necessarily satisfy the triangular inequality. We show that the…

Data Structures and Algorithms · Computer Science 2017-05-12 Cheng-Shang Chang , Chia-Tai Chang , Duan-Shin Lee , Li-Heng Liou

Improved Conic Reformulations for K-means Clustering

In this paper, we show that the popular K-means clustering problem can equivalently be reformulated as a conic program of polynomial size. The arising convex optimization problem is NP-hard, but amenable to a tractable semidefinite…

Optimization and Control · Mathematics 2018-07-23 Madhushini Narayana Prasad , Grani A. Hanasusanto

Sketch-and-Lift: Scalable Subsampled Semidefinite Program for $K$-means Clustering

Semidefinite programming (SDP) is a powerful tool for tackling a wide range of computationally hard problems such as clustering. Despite the high accuracy, semidefinite programs are often too slow in practice with poor scalability on large…

Machine Learning · Statistics 2022-02-10 Yubo Zhuang , Xiaohui Chen , Yun Yang