Related papers: Optimizing Tail Latency in Commodity Datacenters u…
Datacenter applications demand both low latency and high throughput; while interactive applications (e.g., Web Search) demand low tail latency for their short messages due to their partition-aggregate software architecture, many…
In this paper, we consider how to provide fast estimates of flow-level tail latency performance for very large scale data center networks. Network tail latency is often a crucial metric for cloud application performance that can be affected…
Increasingly stringent throughput and latency requirements in datacenter networks demand fast and accurate congestion control. We observe that the reaction time and accuracy of existing datacenter congestion control schemes are inherently…
The rapid growth in the size of large language models has necessitated the partitioning of computational workloads across accelerators such as GPUs, TPUs, and NPUs. However, these parallelization strategies incur substantial data…
Optimizing tail latency while efficiently managing computational resources is crucial for delivering high-performance, latency-sensitive services in edge computing. Emerging applications, such as augmented reality, require low-latency…
Distributed storage systems are known to be susceptible to long tails in response time. In modern online storage systems such as Bing, Facebook, and Amazon, the long tails of the service latency are of particular concern. with 99.9th…
Packet buffers in datacenter switches are shared across all the switch ports in order to improve the overall throughput. The trend of shrinking buffer sizes in datacenter switches makes buffer sharing extremely challenging and a critical…
In the present-day, distributed applications are commonly spread across multiple datacenters, reaching out to edge and fog computing locations. The transition away from single datacenter hosting is driven by capacity constraints in…
Effective congestion control for data center networks is becoming increasingly challenging with a growing amount of latency sensitive traffic, much fatter links, and extremely bursty traffic. Widely deployed algorithms, such as DCTCP and…
Modern latency-critical online services such as search engines often process requests by consulting large input data spanning massive parallel components. Hence the tail latency of these components determines the service latency. To trade…
Data center networks need to provide low latency, especially at the tail, as demanded by many interactive applications. To improve tail latency, existing approaches require modifications to switch hardware and/or end-host operating systems,…
Application tail latency is a key metric for many services, with high latencies being linked directly to loss of revenue. Modern deeply-nested micro-service architectures exacerbate tail latencies, increasing the likelihood of users…
Mobile edge computing (MEC) can reduce the latency of cloud computing successfully. However, the edge server may fail due to the hardware of software issues. When the edge server failure happens, the users who offload tasks to this server…
As link speeds increase in datacenter networks, existing congestion control algorithms become less effective in providing fast convergence. TCP-based algorithms that probe for bandwidth take a long time to reach the fair-share and lead to…
Over the past years, TCP has gone through numerous updates to provide performance enhancement under diverse network conditions. However, with respect to losses, little can be achieved with legacy TCP detection and recovery mechanisms. Both…
We consider a transmission of a delay-sensitive data stream from a single source to a single destination. The reliability of this transmission may suffer from bursty packet losses - the predominant type of failures in today's Internet. An…
Forward Error Correction (FEC) remains essential for protecting video streaming against packet loss, yet most real deployments still rely on static, coarse-grained configurations that cannot react to rapid shifts in loss rate, goodput, or…
Cloud systems have rapidly expanded worldwide in the last decade, shifting computational tasks to cloud servers where clients submit their requests. Among cloud workloads, latency-critical applications -- characterized by high-percentile…
Production data centers operate under various workload sizes ranging from latency-sensitive mice flows to long-lived elephant flows. However, the predominant load balancing scheme in data center networks, equal-cost multi-path (ECMP), is…
In the realm of edge computing, the increasing demand for high Quality of Service (QoS), particularly in dynamic multimedia streaming applications (e.g., Augmented Reality/Virtual Reality and online gaming), has prompted the need for…