Efficient Data-Plane Memory Scheduling for In-Network Aggregation

Hao Wang; Yuxuan Qin; ChonLam Lao; Yanfang Le; Wenfei Wu; Kai Chen

Efficient Data-Plane Memory Scheduling for In-Network Aggregation

Distributed, Parallel, and Cluster Computing 2022-01-19 v1 Artificial Intelligence

Authors: Hao Wang , Yuxuan Qin , ChonLam Lao , Yanfang Le , Wenfei Wu , Kai Chen

Abstract

As the scale of distributed training grows, communication becomes a bottleneck. To accelerate the communication, recent works introduce In-Network Aggregation (INA), which moves the gradients summation into network middle-boxes, e.g., programmable switches to reduce the traffic volume. However, switch memory is scarce compared to the volume of gradients transmitted in distributed training. Although literature applies methods like pool-based streaming or dynamic sharing to tackle the mismatch, switch memory is still a potential performance bottleneck. Furthermore, we observe the under-utilization of switch memory due to the synchronization requirement for aggregator deallocation in recent works. To improve the switch memory utilization, we propose ESA, an $\underline{E}$ fficient Switch Memory $\underline{S}$ cheduler for In-Network $\underline{A}$ ggregation. At its cores, ESA enforces the preemptive aggregator allocation primitive and introduces priority scheduling at the data-plane, which improves the switch memory utilization and average job completion time (JCT). Experiments show that ESA can improve the average JCT by up to $1.35\times$ .

Keywords

distributed computing scheduling memory hierarchy

Cite

@article{arxiv.2201.06398,
  title  = {Efficient Data-Plane Memory Scheduling for In-Network Aggregation},
  author = {Hao Wang and Yuxuan Qin and ChonLam Lao and Yanfang Le and Wenfei Wu and Kai Chen},
  journal= {arXiv preprint arXiv:2201.06398},
  year   = {2022}
}

Efficient Data-Plane Memory Scheduling for In-Network Aggregation

Abstract

Keywords

Cite

Related papers