Related papers: A semi-supervised sparse K-Means algorithm
Traditionally, practitioners initialize the {\tt k-means} algorithm with centers chosen uniformly at random. Randomized initialization with uneven weights ({\tt k-means++}) has recently been used to improve the performance over this…
Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning…
Clustering, a fundamental activity in unsupervised learning, is notoriously difficult when the feature space is high-dimensional. Fortunately, in many realistic scenarios, only a handful of features are relevant in distinguishing clusters.…
The $k$-means algorithm is arguably the most popular nonparametric clustering method but cannot generally be applied to datasets with incomplete records. The usual practice then is to either impute missing values under an assumed…
Supervised classification can be effective for prediction but sometimes weak on interpretability or explainability (XAI). Clustering, on the other hand, tends to isolate categories or profiles that can be meaningful but there is no…
In this paper, we present a semi-supervised learning algorithm for classification of text documents. A method of labeling unlabeled text documents is presented. The presented method is based on the principle of divide and conquer strategy.…
In many situations where the interest lies in identifying clusters one might expect that not all available variables carry information about these groups. Furthermore, data quality (e.g. outliers or missing entries) might present a serious…
Medical image analysis using supervised deep learning methods remains problematic because of the reliance of deep learning methods on large amounts of labelled training data. Although medical imaging data repositories continue to expand…
Among many clustering algorithms, the K-means clustering algorithm is widely used because of its simple algorithm and fast convergence. However, this algorithm suffers from incomplete data, where some samples have missed some of their…
Feature selection is an important and challenging task in high dimensional clustering. For example, in genomics, there may only be a small number of genes that are differentially expressed, which are informative to the overall clustering…
Convex clustering, a convex relaxation of k-means clustering and hierarchical clustering, has drawn recent attentions since it nicely addresses the instability issue of traditional nonconvex clustering methods. Although its computational…
We consider the problem of clustering partially labeled data from a minimal number of randomly chosen pairwise comparisons between the items. We introduce an efficient local algorithm based on a power iteration of the non-backtracking…
Semi-supervised clustering methods incorporate a limited amount of supervision into the clustering process. Typically, this supervision is provided by the user in the form of pairwise constraints. Existing methods use such constraints in…
The minimum sum-of-squares clustering (MSSC), or k-means type clustering, is traditionally considered an unsupervised learning task. In recent years, the use of background knowledge to improve the cluster quality and promote…
Clustering using neural networks has recently demonstrated promising performance in machine learning and computer vision applications. However, the performance of current approaches is limited either by unsupervised learning or their…
Many studies in data mining have proposed a new learning called semi-Supervised. Such type of learning combines unlabeled and labeled data which are hard to obtain. However, in unsupervised methods, the only unlabeled data are used. The…
Clustering data is a popular feature in the field of unsupervised machine learning. Most algorithms aim to find the best method to extract consistent clusters of data, but very few of them intend to cluster data that share the same…
Unsupervised feature selection has been always attracting research attention in the communities of machine learning and data mining for decades. In this paper, we propose an unsupervised feature selection method seeking a feature…
Unsupervised machine learning, and in particular data clustering, is a powerful approach for the analysis of datasets and identification of characteristic features occurring throughout a dataset. It is gaining popularity across scientific…
The K-means algorithm is arguably the most popular data clustering method, commonly applied to processed datasets in some "feature spaces", as is in spectral clustering. Highly sensitive to initializations, however, K-means encounters a…