English

Debugging OpenStack Problems Using a State Graph Approach

Distributed, Parallel, and Cluster Computing 2016-06-21 v1

Abstract

It is hard to operate and debug systems like OpenStack that integrate many independently developed modules with multiple levels of abstractions. A major challenge is to navigate through the complex dependencies and relationships of the states in different modules or subsystems, to ensure the correctness and consistency of these states. We present a system that captures the runtime states and events from the entire OpenStack-Ceph stack, and automatically organizes these data into a graph that we call system operation state graph (SOSG).With SOSG we can use intuitive graph traversal techniques to solve problems like reasoning about the state of a virtual machine. Also, using graph-based anomaly detection, we can automatically discover hidden problems in OpenStack. We have a scalable implementation of SOSG, and evaluate the approach on a 125-node production OpenStack cluster, finding a number of interesting problems.

Keywords

Cite

@article{arxiv.1606.05963,
  title  = {Debugging OpenStack Problems Using a State Graph Approach},
  author = {Yong Xiang and Hu Li and Sen Wang and Wei Xu},
  journal= {arXiv preprint arXiv:1606.05963},
  year   = {2016}
}
R2 v1 2026-06-22T14:28:58.256Z