English

Diffusion-based Time Series Data Imputation for Microsoft 365

Distributed, Parallel, and Cluster Computing 2023-09-07 v1 Artificial Intelligence Machine Learning

Abstract

Reliability is extremely important for large-scale cloud systems like Microsoft 365. Cloud failures such as disk failure, node failure, etc. threaten service reliability, resulting in online service interruptions and economic loss. Existing works focus on predicting cloud failures and proactively taking action before failures happen. However, they suffer from poor data quality like data missing in model training and prediction, which limits the performance. In this paper, we focus on enhancing data quality through data imputation by the proposed Diffusion+, a sample-efficient diffusion model, to impute the missing data efficiently based on the observed data. Our experiments and application practice show that our model contributes to improving the performance of the downstream failure prediction task.

Keywords

Cite

@article{arxiv.2309.02564,
  title  = {Diffusion-based Time Series Data Imputation for Microsoft 365},
  author = {Fangkai Yang and Wenjie Yin and Lu Wang and Tianci Li and Pu Zhao and Bo Liu and Paul Wang and Bo Qiao and Yudong Liu and Mårten Björkman and Saravan Rajmohan and Qingwei Lin and Dongmei Zhang},
  journal= {arXiv preprint arXiv:2309.02564},
  year   = {2023}
}