English

Limited Random Walk Algorithm for Big Graph Data Clustering

Social and Information Networks 2016-06-22 v1 Physics and Society

Abstract

Graph clustering is an important technique to understand the relationships between the vertices in a big graph. In this paper, we propose a novel random-walk-based graph clustering method. The proposed method restricts the reach of the walking agent using an inflation function and a normalization function. We analyze the behavior of the limited random walk procedure and propose a novel algorithm for both global and local graph clustering problems. Previous random-walk-based algorithms depend on the chosen fitness function to find the clusters around a seed vertex. The proposed algorithm tackles the problem in an entirely different manner. We use the limited random walk procedure to find attracting vertices in a graph and use them as features to cluster the vertices. According to the experimental results on the simulated graph data and the real-world big graph data, the proposed method is superior to the state-of-the-art methods in solving graph clustering problems. Since the proposed method uses the embarrassingly parallel paradigm, it can be efficiently implemented and embedded in any parallel computing environment such as a MapReduce framework. Given enough computing resources, we are capable of clustering graphs with millions of vertices and hundreds millions of edges in a reasonable time.

Keywords

Cite

@article{arxiv.1606.06450,
  title  = {Limited Random Walk Algorithm for Big Graph Data Clustering},
  author = {Honglei Zhang and Jenni Raitoharju and Serkan Kiranyaz and Moncef Gabbouj},
  journal= {arXiv preprint arXiv:1606.06450},
  year   = {2016}
}

Comments

12 pages, 3 figures, 7 tables, journal paper