English

Parallel Batch-Dynamic $k$d-Trees

Data Structures and Algorithms 2021-12-14 v1 Databases

Abstract

kkd-trees are widely used in parallel databases to support efficient neighborhood/similarity queries. Supporting parallel updates to kkd-trees is therefore an important operation. In this paper, we present BDL-tree, a parallel, batch-dynamic implementation of a kkd-tree that allows for efficient parallel kk-NN queries over dynamically changing point sets. BDL-trees consist of a log-structured set of kkd-trees which can be used to efficiently insert or delete batches of points in parallel with polylogarithmic depth. Specifically, given a BDL-tree with nn points, each batch of BB updates takes O(Blog2(n+B))O(B\log^2{(n+B)}) amortized work and O(log(n+B)loglog(n+B))O(\log(n+B)\log\log{(n+B)}) depth (parallel time). We provide an optimized multicore implementation of BDL-trees. Our optimizations include parallel cache-oblivious kkd-tree construction and parallel bloom filter construction. Our experiments on a 36-core machine with two-way hyper-threading using a variety of synthetic and real-world datasets show that our implementation of BDL-tree achieves a self-relative speedup of up to 34.8×34.8\times (28.4×28.4\times on average) for batch insertions, up to 35.5×35.5\times (27.2×27.2\times on average) for batch deletions, and up to 46.1×46.1\times (40.0×40.0\times on average) for kk-nearest neighbor queries. In addition, it achieves throughputs of up to 14.5 million updates/second for batch-parallel updates and 6.7 million queries/second for kk-NN queries. We compare to two baseline kkd-tree implementations and demonstrate that BDL-trees achieve a good tradeoff between the two baseline options for implementing batch updates.

Keywords

Cite

@article{arxiv.2112.06188,
  title  = {Parallel Batch-Dynamic $k$d-Trees},
  author = {Rahul Yesantharao and Yiqiu Wang and Laxman Dhulipala and Julian Shun},
  journal= {arXiv preprint arXiv:2112.06188},
  year   = {2021}
}