English
Related papers

Related papers: A semi-supervised sparse K-Means algorithm

200 papers

Traditionally, practitioners initialize the {\tt k-means} algorithm with centers chosen uniformly at random. Randomized initialization with uneven weights ({\tt k-means++}) has recently been used to improve the performance over this…

Machine Learning · Statistics 2016-02-02 Jordan Yoder , Carey E. Priebe

Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning…

Methodology · Statistics 2014-07-11 Eric Bair

Clustering, a fundamental activity in unsupervised learning, is notoriously difficult when the feature space is high-dimensional. Fortunately, in many realistic scenarios, only a handful of features are relevant in distinguishing clusters.…

Machine Learning · Statistics 2020-10-23 Zhiyue Zhang , Kenneth Lange , Jason Xu

The $k$-means algorithm is arguably the most popular nonparametric clustering method but cannot generally be applied to datasets with incomplete records. The usual practice then is to either impute missing values under an assumed…

Machine Learning · Statistics 2018-09-11 Andrew Lithio , Ranjan Maitra

Supervised classification can be effective for prediction but sometimes weak on interpretability or explainability (XAI). Clustering, on the other hand, tends to isolate categories or profiles that can be meaningful but there is no…

Machine Learning · Computer Science 2021-04-27 Vincent Lemaire , Oumaima Alaoui Ismaili , Antoine Cornuéjols , Dominique Gay

In this paper, we present a semi-supervised learning algorithm for classification of text documents. A method of labeling unlabeled text documents is presented. The presented method is based on the principle of divide and conquer strategy.…

Machine Learning · Computer Science 2017-06-27 Harsha S. Gowda , Mahamad Suhil , D. S. Guru , Lavanya Narayana Raju

In many situations where the interest lies in identifying clusters one might expect that not all available variables carry information about these groups. Furthermore, data quality (e.g. outliers or missing entries) might present a serious…

Machine Learning · Statistics 2012-01-31 Yumi Kondo , Matias Salibian-Barrera , Ruben Zamar

Medical image analysis using supervised deep learning methods remains problematic because of the reliance of deep learning methods on large amounts of labelled training data. Although medical imaging data repositories continue to expand…

Computer Vision and Pattern Recognition · Computer Science 2019-06-11 Euijoon Ahn , Ashnil Kumar , Dagan Feng , Michael Fulham , Jinman Kim

Among many clustering algorithms, the K-means clustering algorithm is widely used because of its simple algorithm and fast convergence. However, this algorithm suffers from incomplete data, where some samples have missed some of their…

Machine Learning · Computer Science 2022-12-26 Ali Beikmohammadi

Feature selection is an important and challenging task in high dimensional clustering. For example, in genomics, there may only be a small number of genes that are differentially expressed, which are informative to the overall clustering…

Methodology · Statistics 2019-10-07 Xiangrui Zeng , Hongyu Zheng

Convex clustering, a convex relaxation of k-means clustering and hierarchical clustering, has drawn recent attentions since it nicely addresses the instability issue of traditional nonconvex clustering methods. Although its computational…

Methodology · Statistics 2019-01-01 Binhuan Wang , Yilong Zhang , Will Wei Sun , Yixin Fang

We consider the problem of clustering partially labeled data from a minimal number of randomly chosen pairwise comparisons between the items. We introduce an efficient local algorithm based on a power iteration of the non-backtracking…

Machine Learning · Computer Science 2018-06-28 Alaa Saade , Florent Krzakala , Marc Lelarge , Lenka Zdeborová

Semi-supervised clustering methods incorporate a limited amount of supervision into the clustering process. Typically, this supervision is provided by the user in the form of pairwise constraints. Existing methods use such constraints in…

Machine Learning · Statistics 2016-09-26 Toon Van Craenendonck , Hendrik Blockeel

The minimum sum-of-squares clustering (MSSC), or k-means type clustering, is traditionally considered an unsupervised learning task. In recent years, the use of background knowledge to improve the cluster quality and promote…

Optimization and Control · Mathematics 2022-07-26 Veronica Piccialli , Anna Russo Russo , Antonio M. Sudoso

Clustering using neural networks has recently demonstrated promising performance in machine learning and computer vision applications. However, the performance of current approaches is limited either by unsupervised learning or their…

Machine Learning · Computer Science 2018-07-11 Ankita Shukla , Gullal Singh Cheema , Saket Anand

Many studies in data mining have proposed a new learning called semi-Supervised. Such type of learning combines unlabeled and labeled data which are hard to obtain. However, in unsupervised methods, the only unlabeled data are used. The…

Machine Learning · Computer Science 2013-04-16 Badreddine Meftahi , Ourida Ben Boubaker Saidi

Clustering data is a popular feature in the field of unsupervised machine learning. Most algorithms aim to find the best method to extract consistent clusters of data, but very few of them intend to cluster data that share the same…

Machine Learning · Computer Science 2022-06-22 Jean-Sébastien Dessureault , Daniel Massicotte

Unsupervised feature selection has been always attracting research attention in the communities of machine learning and data mining for decades. In this paper, we propose an unsupervised feature selection method seeking a feature…

Machine Learning · Computer Science 2015-06-04 Sen Wang , Feiping Nie , Xiaojun Chang , Lina Yao , Xue Li , Quan Z. Sheng

Unsupervised machine learning, and in particular data clustering, is a powerful approach for the analysis of datasets and identification of characteristic features occurring throughout a dataset. It is gaining popularity across scientific…

Mesoscale and Nanoscale Physics · Physics 2021-03-23 Maria El Abbassi , Jan Overbeck , Oliver Braun , Michel Calame , Herre S. J. van der Zant , Mickael L. Perrin

The K-means algorithm is arguably the most popular data clustering method, commonly applied to processed datasets in some "feature spaces", as is in spectral clustering. Highly sensitive to initializations, however, K-means encounters a…

Machine Learning · Computer Science 2019-06-04 Feiyu Chen , Yuchen Yang , Liwei Xu , Taiping Zhang , Yin Zhang
‹ Prev 1 2 3 10 Next ›