English

Testing Network Structure Using Relations Between Small Subgraph Probabilities

Methodology 2017-04-25 v1 Social and Information Networks Statistics Theory Statistics Theory

Abstract

We study the problem of testing for structure in networks using relations between the observed frequencies of small subgraphs. We consider the statistics \begin{align*} T_3 & =(\text{edge frequency})^3 - \text{triangle frequency}\\ T_2 & =3(\text{edge frequency})^2(1-\text{edge frequency}) - \text{V-shape frequency} \end{align*} and prove a central limit theorem for (T2,T3)(T_2, T_3) under an Erd\H{o}s-R\'{e}nyi null model. We then analyze the power of the associated χ2\chi^2 test statistic under a general class of alternative models. In particular, when the alternative is a kk-community stochastic block model, with kk unknown, the power of the test approaches one. Moreover, the signal-to-noise ratio required is strictly weaker than that required for community detection. We also study the relation with other statistics over three-node subgraphs, and analyze the error under two natural algorithms for sampling small subgraphs. Together, our results show how global structural characteristics of networks can be inferred from local subgraph frequencies, without requiring the global community structure to be explicitly estimated.

Keywords

Cite

@article{arxiv.1704.06742,
  title  = {Testing Network Structure Using Relations Between Small Subgraph Probabilities},
  author = {Chao Gao and John Lafferty},
  journal= {arXiv preprint arXiv:1704.06742},
  year   = {2017}
}