English

Tree density estimation

Statistics Theory 2022-09-23 v5 Machine Learning Machine Learning Statistics Theory

Abstract

We study the problem of estimating the density f(x)f(\boldsymbol x) of a random vector X{\boldsymbol X} in Rd\mathbb R^d. For a spanning tree TT defined on the vertex set {1,,d}\{1,\dots ,d\}, the tree density fTf_{T} is a product of bivariate conditional densities. An optimal spanning tree minimizes the Kullback-Leibler divergence between ff and fTf_{T}. From i.i.d. data we identify an optimal tree TT^* and efficiently construct a tree density estimate fnf_n such that, without any regularity conditions on the density ff, one has limnfn(x)fT(x)dx=0\lim_{n\to \infty} \int |f_n(\boldsymbol x)-f_{T^*}(\boldsymbol x)|d\boldsymbol x=0 a.s. For Lipschitz ff with bounded support, E{fn(x)fT(x)dx}=O(n1/4)\mathbb E \left\{ \int |f_n(\boldsymbol x)-f_{T^*}(\boldsymbol x)|d\boldsymbol x\right\}=O\big(n^{-1/4}\big), a dimension-free rate.

Keywords

Cite

@article{arxiv.2111.11971,
  title  = {Tree density estimation},
  author = {László Györfi and Aryeh Kontorovich and Roi Weiss},
  journal= {arXiv preprint arXiv:2111.11971},
  year   = {2022}
}