Curse of Dimensionality in Pivot-based Indexes

Ilya Volnyansky; Vladimir Pestov

doi:10.1109/SISAP.2009.9

Curse of Dimensionality in Pivot-based Indexes

Data Structures and Algorithms 2016-11-17 v2

Authors: Ilya Volnyansky , Vladimir Pestov

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

We offer a theoretical validation of the curse of dimensionality in the pivot-based indexing of datasets for similarity search, by proving, in the framework of statistical learning, that in high dimensions no pivot-based indexing scheme can essentially outperform the linear scan. A study of the asymptotic performance of pivot-based indexing schemes is performed on a sequence of datasets modeled as samples $X_d$ picked in i.i.d. fashion from metric spaces $\Omega_d$ . We allow the size of the dataset $n=n_d$ to be such that $d$ , the ``dimension'', is superlogarithmic but subpolynomial in $n$ . The number of pivots is allowed to grow as $o(n/d)$ . We pick the least restrictive cost model of similarity search where we count each distance calculation as a single computation and disregard the rest. We demonstrate that if the intrinsic dimension of the spaces $\Omega_d$ in the sense of concentration of measure phenomenon is $O(d)$ , then the performance of similarity search pivot-based indexes is asymptotically linear in $n$ .

Keywords

dimensionality reduction metric space approximation algorithm

Cite

@article{arxiv.0906.0391,
  title  = {Curse of Dimensionality in Pivot-based Indexes},
  author = {Ilya Volnyansky and Vladimir Pestov},
  journal= {arXiv preprint arXiv:0906.0391},
  year   = {2016}
}

Comments

9 pp., 4 figures, latex 2e, a revised submission to the 2nd International Workshop on Similarity Search and Applications, 2009

Curse of Dimensionality in Pivot-based Indexes

Abstract

Keywords

Cite

Comments

Related papers