English

Accelerating System Log Processing by Semi-supervised Learning: A Technical Report

Software Engineering 2018-11-06 v1 Computation and Language Information Retrieval

Abstract

There is an increasing need for more automated system-log analysis tools for large scale online system in a timely manner. However, conventional way to monitor and classify the log output based on keyword list does not scale well for complex system in which codes contributed by a large group of developers, with diverse ways of encoding the error messages, often with misleading pre-set labels. In this paper, we propose that the design of a large scale online log analysis should follow the "Least Prior Knowledge Principle", in which unsupervised or semi-supervised solution with the minimal prior knowledge of the log should be encoded directly. Thereby, we report our experience in designing a two-stage machine learning based method, in which the system logs are regarded as the output of a quasi-natural language, pre-filtered by a perplexity score threshold, and then undergo a fine-grained classification procedure. Tests on empirical data show that our method has obvious advantage regarding to the processing speed and classification accuracy.

Keywords

Cite

@article{arxiv.1811.01833,
  title  = {Accelerating System Log Processing by Semi-supervised Learning: A Technical Report},
  author = {Guofu Li and Pengjia Zhu and Zhiyi Chen},
  journal= {arXiv preprint arXiv:1811.01833},
  year   = {2018}
}
R2 v1 2026-06-23T05:04:40.673Z