An Efficient Streaming Algorithm for Approximating Graphlet Distributions
Abstract
In recent years, the problem of computing the frequencies of the induced -vertex subgraphs of a graph, or \emph{-graphlets}, has become central. One approach for this problem is to sample -graphlets randomly. Classic algorithms for -graphlet sampling require loading the entire graph into main memory, making them impractical for massive graphs. To bypass this limitation, Bourreau et al. (NeurIPS 2024) introduced a \emph{streaming} algorithm that through nontrivial techniques makes only passes using memory. In this work we break their -pass bound by giving an algorithm that, for any fixed , makes passes using memory. As a consequence of their lower bound, our algorithm is optimal up to a factor of in the memory usage. We use this sampling algorithm to obtain an efficient method of approximating -graphlet distributions. Experiments on real-world and synthetic graphs show that our algorithm is always at least as good as the one of Bourreau et al., and outperforms it by orders of magnitude on mildly dense graphs.
Cite
@article{arxiv.2604.25400,
title = {An Efficient Streaming Algorithm for Approximating Graphlet Distributions},
author = {Marco Bressan and T-H. Hubert Chan and Qipeng Kuang and Mauro Sozio},
journal= {arXiv preprint arXiv:2604.25400},
year = {2026}
}