English

SURF: A Simple, Universal, Robust, Fast Distribution Learning Algorithm

Machine Learning 2021-02-15 v2 Information Theory Machine Learning math.IT Statistics Theory Statistics Theory

Abstract

Sample- and computationally-efficient distribution estimation is a fundamental tenet in statistics and machine learning. We present SURF, an algorithm for approximating distributions by piecewise polynomials. SURF is: simple, replacing prior complex optimization techniques by straight-forward {empirical probability} approximation of each potential polynomial piece {through simple empirical-probability interpolation}, and using plain divide-and-conquer to merge the pieces; universal, as well-known polynomial-approximation results imply that it accurately approximates a large class of common distributions; robust to distribution mis-specification as for any degree d8d \le 8, it estimates any distribution to an 1\ell_1 distance <3< 3 times that of the nearest degree-dd piecewise polynomial, improving known factor upper bounds of 3 for single polynomials and 15 for polynomials with arbitrarily many pieces; fast, using optimal sample complexity, running in near sample-linear time, and if given sorted samples it may be parallelized to run in sub-linear time. In experiments, SURF outperforms state-of-the art algorithms.

Keywords

Cite

@article{arxiv.2002.09589,
  title  = {SURF: A Simple, Universal, Robust, Fast Distribution Learning Algorithm},
  author = {Yi Hao and Ayush Jain and Alon Orlitsky and Vaishakh Ravindrakumar},
  journal= {arXiv preprint arXiv:2002.09589},
  year   = {2021}
}

Comments

27 pages, 9 figures, 3 tables