Curse of Dimensionality in Pivot-based Indexes
Abstract
We offer a theoretical validation of the curse of dimensionality in the pivot-based indexing of datasets for similarity search, by proving, in the framework of statistical learning, that in high dimensions no pivot-based indexing scheme can essentially outperform the linear scan. A study of the asymptotic performance of pivot-based indexing schemes is performed on a sequence of datasets modeled as samples picked in i.i.d. fashion from metric spaces . We allow the size of the dataset to be such that , the ``dimension'', is superlogarithmic but subpolynomial in . The number of pivots is allowed to grow as . We pick the least restrictive cost model of similarity search where we count each distance calculation as a single computation and disregard the rest. We demonstrate that if the intrinsic dimension of the spaces in the sense of concentration of measure phenomenon is , then the performance of similarity search pivot-based indexes is asymptotically linear in .
Cite
@article{arxiv.0906.0391,
title = {Curse of Dimensionality in Pivot-based Indexes},
author = {Ilya Volnyansky and Vladimir Pestov},
journal= {arXiv preprint arXiv:0906.0391},
year = {2016}
}
Comments
9 pp., 4 figures, latex 2e, a revised submission to the 2nd International Workshop on Similarity Search and Applications, 2009