English

Zero-Shot Cost Models for Distributed Stream Processing

Distributed, Parallel, and Cluster Computing 2022-07-11 v1 Databases

Abstract

This paper proposes a learned cost estimation model for Distributed Stream Processing Systems (DSPS) with an aim to provide accurate cost predictions of executing queries. A major premise of this work is that the proposed learned model can generalize to the dynamics of streaming workloads out-of-the-box. This means a model once trained can accurately predict performance metrics such as latency and throughput even if the characteristics of the data and workload or the deployment of operators to hardware changes at runtime. That way, the model can be used to solve tasks such as optimizing the placement of operators to minimize the end-to-end latency of a streaming query or maximize its throughput even under varying conditions. Our evaluation on a well-known DSPS, Apache Storm, shows that the model can predict accurately for unseen workloads and queries while generalizing across real-world benchmarks.

Keywords

Cite

@article{arxiv.2207.03823,
  title  = {Zero-Shot Cost Models for Distributed Stream Processing},
  author = {Roman Heinrich and Manisha Luthra and Harald Kornmayer and Carsten Binnig},
  journal= {arXiv preprint arXiv:2207.03823},
  year   = {2022}
}

Comments

To appear in the Proceedings of The 16th ACM International Conference on Distributed and Event-based Systems (DEBS `22), June 27-30, 2022, Copenhagen, Denmark