Parallel Batch-Dynamic $k$d-Trees
Abstract
d-trees are widely used in parallel databases to support efficient neighborhood/similarity queries. Supporting parallel updates to d-trees is therefore an important operation. In this paper, we present BDL-tree, a parallel, batch-dynamic implementation of a d-tree that allows for efficient parallel -NN queries over dynamically changing point sets. BDL-trees consist of a log-structured set of d-trees which can be used to efficiently insert or delete batches of points in parallel with polylogarithmic depth. Specifically, given a BDL-tree with points, each batch of updates takes amortized work and depth (parallel time). We provide an optimized multicore implementation of BDL-trees. Our optimizations include parallel cache-oblivious d-tree construction and parallel bloom filter construction. Our experiments on a 36-core machine with two-way hyper-threading using a variety of synthetic and real-world datasets show that our implementation of BDL-tree achieves a self-relative speedup of up to ( on average) for batch insertions, up to ( on average) for batch deletions, and up to ( on average) for -nearest neighbor queries. In addition, it achieves throughputs of up to 14.5 million updates/second for batch-parallel updates and 6.7 million queries/second for -NN queries. We compare to two baseline d-tree implementations and demonstrate that BDL-trees achieve a good tradeoff between the two baseline options for implementing batch updates.
Keywords
Cite
@article{arxiv.2112.06188,
title = {Parallel Batch-Dynamic $k$d-Trees},
author = {Rahul Yesantharao and Yiqiu Wang and Laxman Dhulipala and Julian Shun},
journal= {arXiv preprint arXiv:2112.06188},
year = {2021}
}