English

A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling

Data Structures and Algorithms 2018-11-20 v1

Abstract

In the subgraph counting problem, we are given a input graph G(V,E)G(V, E) and a target graph HH; the goal is to estimate the number of occurrences of HH in GG. Our focus here is on designing sublinear-time algorithms for approximately counting occurrences of HH in GG in the setting where the algorithm is given query access to GG. This problem has been studied in several recent papers which primarily focused on specific families of graphs HH such as triangles, cliques, and stars. However, not much is known about approximate counting of arbitrary graphs HH. This is in sharp contrast to the closely related subgraph enumeration problem that has received significant attention in the database community as the database join problem. The AGM bound shows that the maximum number of occurrences of any arbitrary subgraph HH in a graph GG with mm edges is O(mρ(H))O(m^{\rho(H)}), where ρ(H)\rho(H) is the fractional edge-cover of HH, and enumeration algorithms with matching runtime are known for any HH. We bridge this gap between subgraph counting and subgraph enumeration by designing a sublinear-time algorithm that can estimate the number of any arbitrary subgraph HH in GG, denoted by #H\#H, to within a (1±ϵ)(1\pm \epsilon)-approximation w.h.p. in O(mρ(H)#H)poly(logn,1/ϵ)O(\frac{m^{\rho(H)}}{\#H}) \cdot poly(\log{n},1/\epsilon) time. Our algorithm is allowed the standard set of queries for general graphs, namely degree queries, pair queries and neighbor queries, plus an additional edge-sample query that returns an edge chosen uniformly at random. The performance of our algorithm matches those of Eden et.al. [FOCS 2015, STOC 2018] for counting triangles and cliques and extend them to all choices of subgraph HH under the additional assumption of edge-sample queries. We further show that our algorithm works for the more general database join size estimation problem and prove a matching lower bound for this problem.

Keywords

Cite

@article{arxiv.1811.07780,
  title  = {A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling},
  author = {Sepehr Assadi and Michael Kapralov and Sanjeev Khanna},
  journal= {arXiv preprint arXiv:1811.07780},
  year   = {2018}
}