English

Clustering with Distributed Data

Machine Learning 2019-01-03 v1 Optimization and Control Machine Learning

Abstract

We consider KK-means clustering in networked environments (e.g., internet of things (IoT) and sensor networks) where data is inherently distributed across nodes and processing power at each node may be limited. We consider a clustering algorithm referred to as networked KK-means, or NKNK-means, which relies only on local neighborhood information exchange. Information exchange is limited to low-dimensional statistics and not raw data at the agents. The proposed approach develops a parametric family of multi-agent clustering objectives (parameterized by ρ\rho) and associated distributed NKNK-means algorithms (also parameterized by ρ\rho). The NKNK-means algorithm with parameter ρ\rho converges to a set of fixed points relative to the associated multi-agent objective (designated as `generalized minima'). By appropriate choice of ρ\rho, the set of generalized minima may be brought arbitrarily close to the set of Lloyd's minima. Thus, the NKNK-means algorithm may be used to compute Lloyd's minima of the collective dataset up to arbitrary accuracy.

Keywords

Cite

@article{arxiv.1901.00214,
  title  = {Clustering with Distributed Data},
  author = {Soummya Kar and Brian Swenson},
  journal= {arXiv preprint arXiv:1901.00214},
  year   = {2019}
}