In-database connected component analysis

Harald Bögeholz; Michael Brand; Radu-Alexandru Todor

In-database connected component analysis

Data Structures and Algorithms 2019-10-18 v2 Distributed, Parallel, and Cluster Computing

Authors: Harald Bögeholz , Michael Brand , Radu-Alexandru Todor

Abstract

We describe a Big Data-practical, SQL-implementable algorithm for efficiently determining connected components for graph data stored in a Massively Parallel Processing (MPP) relational database. The algorithm described is a linear-space, randomised algorithm, always terminating with the correct answer but subject to a stochastic running time, such that for any $\epsilon>0$ and any input graph $G=\langle V, E \rangle$ the algorithm terminates after $\mathop{\text{O}}(\log |V|)$ SQL queries with probability of at least $1-\epsilon$ , which we show empirically to translate to a quasi-linear runtime in practice.

Keywords

relational database parallel algorithm graph algorithms

Cite

@article{arxiv.1802.09478,
  title  = {In-database connected component analysis},
  author = {Harald Bögeholz and Michael Brand and Radu-Alexandru Todor},
  journal= {arXiv preprint arXiv:1802.09478},
  year   = {2019}
}

Comments

major revision with new datasets

In-database connected component analysis

Abstract

Keywords

Cite

Comments

Related papers