TURF: A Two-factor, Universal, Robust, Fast Distribution Learning Algorithm

Yi Hao; Ayush Jain; Alon Orlitsky; Vaishakh Ravindrakumar

TURF: A Two-factor, Universal, Robust, Fast Distribution Learning Algorithm

Machine Learning 2022-06-22 v2 Machine Learning Statistics Theory Statistics Theory

Authors: Yi Hao , Ayush Jain , Alon Orlitsky , Vaishakh Ravindrakumar

Abstract

Approximating distributions from their samples is a canonical statistical-learning problem. One of its most powerful and successful modalities approximates every distribution to an $\ell_1$ distance essentially at most a constant times larger than its closest $t$ -piece degree- $d$ polynomial, where $t\ge1$ and $d\ge0$ . Letting $c_{t,d}$ denote the smallest such factor, clearly $c_{1,0}=1$ , and it can be shown that $c_{t,d}\ge 2$ for all other $t$ and $d$ . Yet current computationally efficient algorithms show only $c_{t,1}\le 2.25$ and the bound rises quickly to $c_{t,d}\le 3$ for $d\ge 9$ . We derive a near-linear-time and essentially sample-optimal estimator that establishes $c_{t,d}=2$ for all $(t,d)\ne(1,0)$ . Additionally, for many practical distributions, the lowest approximation distance is achieved by polynomials with vastly varying number of pieces. We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation. Experiments combining the two techniques confirm improved performance over existing methodologies.

Keywords

theoretical bounds and convergence sampling algorithms optimal transport

Cite

@article{arxiv.2202.07172,
  title  = {TURF: A Two-factor, Universal, Robust, Fast Distribution Learning Algorithm},
  author = {Yi Hao and Ayush Jain and Alon Orlitsky and Vaishakh Ravindrakumar},
  journal= {arXiv preprint arXiv:2202.07172},
  year   = {2022}
}

Comments

19 pages, 12 figures

TURF: A Two-factor, Universal, Robust, Fast Distribution Learning Algorithm

Abstract

Keywords

Cite

Comments

Related papers