In-database connected component analysis
Data Structures and Algorithms
2019-10-18 v2 Distributed, Parallel, and Cluster Computing
Abstract
We describe a Big Data-practical, SQL-implementable algorithm for efficiently determining connected components for graph data stored in a Massively Parallel Processing (MPP) relational database. The algorithm described is a linear-space, randomised algorithm, always terminating with the correct answer but subject to a stochastic running time, such that for any and any input graph the algorithm terminates after SQL queries with probability of at least , which we show empirically to translate to a quasi-linear runtime in practice.
Cite
@article{arxiv.1802.09478,
title = {In-database connected component analysis},
author = {Harald Bögeholz and Michael Brand and Radu-Alexandru Todor},
journal= {arXiv preprint arXiv:1802.09478},
year = {2019}
}
Comments
major revision with new datasets