English

Efficient Data-Plane Memory Scheduling for In-Network Aggregation

Distributed, Parallel, and Cluster Computing 2022-01-19 v1 Artificial Intelligence

Abstract

As the scale of distributed training grows, communication becomes a bottleneck. To accelerate the communication, recent works introduce In-Network Aggregation (INA), which moves the gradients summation into network middle-boxes, e.g., programmable switches to reduce the traffic volume. However, switch memory is scarce compared to the volume of gradients transmitted in distributed training. Although literature applies methods like pool-based streaming or dynamic sharing to tackle the mismatch, switch memory is still a potential performance bottleneck. Furthermore, we observe the under-utilization of switch memory due to the synchronization requirement for aggregator deallocation in recent works. To improve the switch memory utilization, we propose ESA, an E\underline{E}fficient Switch Memory S\underline{S}cheduler for In-Network A\underline{A}ggregation. At its cores, ESA enforces the preemptive aggregator allocation primitive and introduces priority scheduling at the data-plane, which improves the switch memory utilization and average job completion time (JCT). Experiments show that ESA can improve the average JCT by up to 1.35×1.35\times.

Keywords

Cite

@article{arxiv.2201.06398,
  title  = {Efficient Data-Plane Memory Scheduling for In-Network Aggregation},
  author = {Hao Wang and Yuxuan Qin and ChonLam Lao and Yanfang Le and Wenfei Wu and Kai Chen},
  journal= {arXiv preprint arXiv:2201.06398},
  year   = {2022}
}
R2 v1 2026-06-24T08:52:20.675Z