Tree density estimation

László Györfi; Aryeh Kontorovich; Roi Weiss

Tree density estimation

Statistics Theory 2022-09-23 v5 Machine Learning Machine Learning Statistics Theory

Authors: László Györfi , Aryeh Kontorovich , Roi Weiss

Abstract

We study the problem of estimating the density $f(\boldsymbol x)$ of a random vector ${\boldsymbol X}$ in $\mathbb R^d$ . For a spanning tree $T$ defined on the vertex set $\{1,\dots ,d\}$ , the tree density $f_{T}$ is a product of bivariate conditional densities. An optimal spanning tree minimizes the Kullback-Leibler divergence between $f$ and $f_{T}$ . From i.i.d. data we identify an optimal tree $T^*$ and efficiently construct a tree density estimate $f_n$ such that, without any regularity conditions on the density $f$ , one has $\lim_{n\to \infty} \int |f_n(\boldsymbol x)-f_{T^*}(\boldsymbol x)|d\boldsymbol x=0$ a.s. For Lipschitz $f$ with bounded support, $\mathbb E \left\{ \int |f_n(\boldsymbol x)-f_{T^*}(\boldsymbol x)|d\boldsymbol x\right\}=O\big(n^{-1/4}\big)$ , a dimension-free rate.

Keywords

trees random trees density estimation

Cite

@article{arxiv.2111.11971,
  title  = {Tree density estimation},
  author = {László Györfi and Aryeh Kontorovich and Roi Weiss},
  journal= {arXiv preprint arXiv:2111.11971},
  year   = {2022}
}

Tree density estimation

Abstract

Keywords

Cite

Related papers