English

Curse of Dimensionality in Pivot-based Indexes

Data Structures and Algorithms 2016-11-17 v2

Abstract

We offer a theoretical validation of the curse of dimensionality in the pivot-based indexing of datasets for similarity search, by proving, in the framework of statistical learning, that in high dimensions no pivot-based indexing scheme can essentially outperform the linear scan. A study of the asymptotic performance of pivot-based indexing schemes is performed on a sequence of datasets modeled as samples XdX_d picked in i.i.d. fashion from metric spaces Ωd\Omega_d. We allow the size of the dataset n=ndn=n_d to be such that dd, the ``dimension'', is superlogarithmic but subpolynomial in nn. The number of pivots is allowed to grow as o(n/d)o(n/d). We pick the least restrictive cost model of similarity search where we count each distance calculation as a single computation and disregard the rest. We demonstrate that if the intrinsic dimension of the spaces Ωd\Omega_d in the sense of concentration of measure phenomenon is O(d)O(d), then the performance of similarity search pivot-based indexes is asymptotically linear in nn.

Keywords

Cite

@article{arxiv.0906.0391,
  title  = {Curse of Dimensionality in Pivot-based Indexes},
  author = {Ilya Volnyansky and Vladimir Pestov},
  journal= {arXiv preprint arXiv:0906.0391},
  year   = {2016}
}

Comments

9 pp., 4 figures, latex 2e, a revised submission to the 2nd International Workshop on Similarity Search and Applications, 2009

R2 v1 2026-06-21T13:08:34.067Z