Same-Cluster Querying for Overlapping Clusters

Wasim Huleihel; Arya Mazumdar; Muriel Médard; Soumyabrata Pal

Same-Cluster Querying for Overlapping Clusters

Machine Learning 2019-10-29 v1 Data Structures and Algorithms Information Theory math.IT Machine Learning

Authors: Wasim Huleihel , Arya Mazumdar , Muriel Médard , Soumyabrata Pal

Abstract

Overlapping clusters are common in models of many practical data-segmentation applications. Suppose we are given $n$ elements to be clustered into $k$ possibly overlapping clusters, and an oracle that can interactively answer queries of the form "do elements $u$ and $v$ belong to the same cluster?" The goal is to recover the clusters with minimum number of such queries. This problem has been of recent interest for the case of disjoint clusters. In this paper, we look at the more practical scenario of overlapping clusters, and provide upper bounds (with algorithms) on the sufficient number of queries. We provide algorithmic results under both arbitrary (worst-case) and statistical modeling assumptions. Our algorithms are parameter free, efficient, and work in the presence of random noise. We also derive information-theoretic lower bounds on the number of queries needed, proving that our algorithms are order optimal. Finally, we test our algorithms over both synthetic and real-world data, showing their practicality and effectiveness.

Keywords

cluster analysis graph clustering optimization algorithm

Cite

@article{arxiv.1910.12490,
  title  = {Same-Cluster Querying for Overlapping Clusters},
  author = {Wasim Huleihel and Arya Mazumdar and Muriel Médard and Soumyabrata Pal},
  journal= {arXiv preprint arXiv:1910.12490},
  year   = {2019}
}

Comments

43 pages, accepted at NeurIPS'19

Same-Cluster Querying for Overlapping Clusters

Abstract

Keywords

Cite

Comments

Related papers