English

PanJoin: A Partition-based Adaptive Stream Join

Databases 2018-11-14 v1

Abstract

In stream processing, stream join is one of the critical sources of performance bottlenecks. The sliding-window-based stream join provides a precise result but consumes considerable computational resources. The current solutions lack support for the join predicates on large windows. These algorithms and their hardware accelerators are either limited to equi-join or use a nested loop join to process all the requests. In this paper, we present a new algorithm called PanJoin which has high throughput on large windows and supports both equi-join and non-equi-join. PanJoin implements three new data structures to reduce computations during the probing phase of stream join. We also implement the most hardware-friendly data structure, called BI-Sort, on FPGA. Our evaluation shows that PanJoin outperforms several recently proposed stream join methods by more than 1000x, and it also adapts well to highly skewed data.

Keywords

Cite

@article{arxiv.1811.05065,
  title  = {PanJoin: A Partition-based Adaptive Stream Join},
  author = {Fei Pan and Hans-Arno Jacobsen},
  journal= {arXiv preprint arXiv:1811.05065},
  year   = {2018}
}