English

An Efficient Streaming Algorithm for Approximating Graphlet Distributions

Data Structures and Algorithms 2026-04-29 v1 Databases Social and Information Networks

Abstract

In recent years, the problem of computing the frequencies of the induced kk-vertex subgraphs of a graph, or \emph{kk-graphlets}, has become central. One approach for this problem is to sample kk-graphlets randomly. Classic algorithms for kk-graphlet sampling require loading the entire graph into main memory, making them impractical for massive graphs. To bypass this limitation, Bourreau et al. (NeurIPS 2024) introduced a \emph{streaming} algorithm that through nontrivial techniques makes only O(logn)O(\log n) passes using O(nlogn)O(n \log n) memory. In this work we break their O(logn)O(\log n)-pass bound by giving an algorithm that, for any fixed c>0c>0, makes O(1/c)O(1/c) passes using O~(n1+c)\tilde O(n^{1+c}) memory. As a consequence of their lower bound, our algorithm is optimal up to a factor of O~(nc)\tilde{O}(n^c) in the memory usage. We use this sampling algorithm to obtain an efficient method of approximating kk-graphlet distributions. Experiments on real-world and synthetic graphs show that our algorithm is always at least as good as the one of Bourreau et al., and outperforms it by orders of magnitude on mildly dense graphs.

Keywords

Cite

@article{arxiv.2604.25400,
  title  = {An Efficient Streaming Algorithm for Approximating Graphlet Distributions},
  author = {Marco Bressan and T-H. Hubert Chan and Qipeng Kuang and Mauro Sozio},
  journal= {arXiv preprint arXiv:2604.25400},
  year   = {2026}
}