Do Data Center Network Metrics Predict Application-Facing Performance?
Abstract
Applications that run in large-scale data center networks (DCNs) rely on the DCN's ability to deliver application requests in a performant manner. DCNs expose a complex design and operational space, and network designers and operators care how different options along this space affect application performance. One might run controlled experiments and measure the corresponding application-facing performance, but such experiments become progressively infeasible at a large scale, and simulations risk yielding inaccurate or incomplete results. Instead, we show that we can predict application-facing performance through more easily measured network metrics. For example, network telemetry metrics (e.g., link utilization) can predict application-facing metrics (e.g., transfer latency). Through large-scale measurements of production networks, we study the correlation between the two types of metrics, and construct predictive, interpretable models that serve as a suggestive guideline to network designers and operators. We show that no single network metric is universally the best predictor (even though some prior work has focused on a single predictor). We found that simple linear models often have the lowest error, while queueing-based models are better in a few cases.
Cite
@article{arxiv.2411.06004,
title = {Do Data Center Network Metrics Predict Application-Facing Performance?},
author = {Brian Chang and Jeffrey C. Mogul and Rui Wang and Mingyang Zhang and Aditya Akella},
journal= {arXiv preprint arXiv:2411.06004},
year = {2024}
}
Comments
17 (main body) + 5 (appendix) pages