English

Optimizing Tail Latency in Commodity Datacenters using Forward Error Correction

Networking and Internet Architecture 2021-10-29 v1

Abstract

Long tail latency of short flows (or messages) greatly affects user-facing applications in datacenters. Prior solutions to the problem introduce significant implementation complexities, such as global state monitoring, complex network control, or non-trivial switch modifications. While promising superior performance, they are hard to implement in practice. This paper presents CloudBurst, a simple, effective yet readily deployable solution achieving similar or even better results without introducing the above complexities. At its core, CloudBurst explores forward error correction (FEC) over multipath - it proactively spreads FEC-coded packets generated from messages over multipath in parallel, and recovers them with the first few arriving ones. As a result, CloudBurst is able to obliviously exploit underutilized paths, thus achieving low tail latency. We have implemented CloudBurst as a user-space library, and deployed it on a testbed with commodity switches. Our testbed and simulation experiments show the superior performance of CloudBurst. For example, CloudBurst achieves 63.69% and 60.06% reduction in 99th percentile message/flow completion time (FCT) compared to DCTCP and PIAS, respectively.

Keywords

Cite

@article{arxiv.2110.15157,
  title  = {Optimizing Tail Latency in Commodity Datacenters using Forward Error Correction},
  author = {Zeng Gaoxiong and Chen Li and Yi Bairen and Chen Kai},
  journal= {arXiv preprint arXiv:2110.15157},
  year   = {2021}
}

Comments

13 pages

R2 v1 2026-06-24T07:16:02.671Z