English

Efficient Document Indexing Using Pivot Tree

Information Retrieval 2016-05-24 v1 Machine Learning

Abstract

We present a novel method for efficiently searching top-k neighbors for documents represented in high dimensional space of terms based on the cosine similarity. Mostly, documents are stored as bag-of-words tf-idf representation. One of the most used ways of computing similarity between a pair of documents is cosine similarity between the vector representations, but cosine similarity is not a metric distance measure as it doesn't follow triangle inequality, therefore most metric searching methods can not be applied directly. We propose an efficient method for indexing documents using a pivot tree that leads to efficient retrieval. We also study the relation between precision and efficiency for the proposed method and compare it with a state of the art in the area of document searching based on inner product.

Keywords

Cite

@article{arxiv.1605.06693,
  title  = {Efficient Document Indexing Using Pivot Tree},
  author = {Gaurav Singh and Benjamin Piwowarski},
  journal= {arXiv preprint arXiv:1605.06693},
  year   = {2016}
}

Comments

6 Pages, 2 Figures

R2 v1 2026-06-22T14:06:28.048Z