English

Exact Single-Source SimRank Computation on Large Graphs

Data Structures and Algorithms 2020-06-19 v3

Abstract

SimRank is a popular measurement for evaluating the node-to-node similarities based on the graph topology. In recent years, single-source and top-kk SimRank queries have received increasing attention due to their applications in web mining, social network analysis, and spam detection. However, a fundamental obstacle in studying SimRank has been the lack of ground truths. The only exact algorithm, Power Method, is computationally infeasible on graphs with more than 10610^6 nodes. Consequently, no existing work has evaluated the actual trade-offs between query time and accuracy on large real-world graphs. In this paper, we present ExactSim, the first algorithm that computes the exact single-source and top-kk SimRank results on large graphs. With high probability, this algorithm produces ground truths with a rigorous theoretical guarantee. We conduct extensive experiments on real-world datasets to demonstrate the efficiency of ExactSim. The results show that ExactSim provides the ground truth for any single-source SimRank query with a precision up to 7 decimal places within a reasonable query time.

Keywords

Cite

@article{arxiv.2004.03493,
  title  = {Exact Single-Source SimRank Computation on Large Graphs},
  author = {Hanzhi Wang and Zhewei Wei and Ye Yuan and Xiaoyong Du and Ji-Rong Wen},
  journal= {arXiv preprint arXiv:2004.03493},
  year   = {2020}
}

Comments

ACM SIGMOD 2020