Fast Randomized Semi-Supervised Clustering

Alaa Saade; Florent Krzakala; Marc Lelarge; Lenka Zdeborová

doi:10.1088/1742-6596/1036/1/012015

Fast Randomized Semi-Supervised Clustering

Machine Learning 2018-06-28 v3 Probability Statistics Theory Machine Learning Statistics Theory

Authors: Alaa Saade , Florent Krzakala , Marc Lelarge , Lenka Zdeborová

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

We consider the problem of clustering partially labeled data from a minimal number of randomly chosen pairwise comparisons between the items. We introduce an efficient local algorithm based on a power iteration of the non-backtracking operator and study its performance on a simple model. For the case of two clusters, we give bounds on the classification error and show that a small error can be achieved from $O(n)$ randomly chosen measurements, where $n$ is the number of items in the dataset. Our algorithm is therefore efficient both in terms of time and space complexities. We also investigate numerically the performance of the algorithm on synthetic and real world data.

Keywords

cluster analysis graph clustering semi-supervised learning

Cite

@article{arxiv.1605.06422,
  title  = {Fast Randomized Semi-Supervised Clustering},
  author = {Alaa Saade and Florent Krzakala and Marc Lelarge and Lenka Zdeborová},
  journal= {arXiv preprint arXiv:1605.06422},
  year   = {2018}
}

Fast Randomized Semi-Supervised Clustering

Abstract

Keywords

Cite

Related papers