Related papers: Quantile-based clustering
The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. In this sense, cluster analysis algorithms are a key element of exploratory data analysis, due to their easiness in the…
Clustering partitions a dataset such that observations placed together in a group are similar but different from those in other groups. Hierarchical and $K$-means clustering are two approaches but have different strengths and weaknesses.…
Quantum computing is a promising paradigm based on quantum theory for performing fast computations. Quantum algorithms are expected to surpass their classical counterparts in terms of computational complexity for certain tasks, including…
We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…
Clustering is one of the most important tools for analysis of large datasets, and perhaps the most popular clustering algorithm is Lloyd's algorithm for $k$-means. This algorithm takes $n$ vectors $V=[v_1,\dots,v_n]\in\mathbb{R}^{d\times…
Clustering is one of the main tasks in exploratory data analysis and descriptive statistics where the main objective is partitioning observations in groups. Clustering has a broad range of application in varied domains like climate,…
Bayesian models offer great flexibility for clustering applications---Bayesian nonparametrics can be used for modeling infinite mixtures, and hierarchical Bayesian models can be utilized for sharing clusters across multiple data sets. For…
Kernel-based clustering algorithms have the ability to capture the non-linear structure in real world data. Among various kernel-based clustering algorithms, kernel k-means has gained popularity due to its simple iterative nature and ease…
Distributed data mining techniques and mainly distributed clustering are widely used in the last decade because they deal with very large and heterogeneous datasets which cannot be gathered centrally. Current distributed clustering…
Machine learning algorithms perform well on identifying patterns in many different datasets due to their versatility. However, as one increases the size of the dataset, the computation time for training and using these statistical models…
The $k$-means algorithm (Lloyd's algorithm) is a widely used method for clustering unlabeled data. A key bottleneck of the $k$-means algorithm is that each iteration requires time linear in the number of data points, which can be expensive…
Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning…
$K$-means clustering is one of the most widely-used partitioning algorithm in cluster analysis due to its simplicity and computational efficiency. However, $K$-means does not provide an appropriate clustering result when applying to data…
In addition to finding meaningful clusters, centroid-based clustering algorithms such as K-means or mean-shift should ideally find centroids that are valid patterns in the input space, representative of data in their cluster. This is…
Factorial k-means (FKM) clustering is a method for clustering objects in a low-dimensional subspace. The advantage of this method is that the partition of objects and the low-dimensional subspace reflecting the cluster structure are…
Comparison of three kind of the clustering and find cost function and loss function and calculate them. Error rate of the clustering methods and how to calculate the error percentage always be one on the important factor for evaluating the…
K-means plays a vital role in data mining and is the simplest and most widely used algorithm under the Euclidean Minimum Sum-of-Squares Clustering (MSSC) model. However, its performance drastically drops when applied to vast amounts of…
Clustering is a NP-hard problem. Thus, no optimal algorithm exists, heuristics are applied to cluster the data. Heuristics can be very resource-intensive, if not applied properly. For substantially large data sets computational efficiencies…
Clustering, a fundamental activity in unsupervised learning, is notoriously difficult when the feature space is high-dimensional. Fortunately, in many realistic scenarios, only a handful of features are relevant in distinguishing clusters.…
In this work, we aim to solve a practical use-case of unsupervised clustering which has applications in predictive maintenance in the energy operations sector using quantum computers. Using only cloud access to quantum computers, we…