Related papers: Efficient Data-Plane Memory Scheduling for In-Netw…
Training machine learning models in parallel is an increasingly important workload. We accelerate distributed parallel training by designing a communication primitive that uses a programmable switch dataplane to execute a key step of the…
The impact of transformer networks is booming, yet, they come with significant computational complexity. It is therefore essential to understand how to optimally map and execute these networks on modern neural processor hardware. So far,…
Distributed storage systems typically maintain strong consistency between data nodes and metadata nodes by adopting ordered writes: 1) first installing data; 2) then updating metadata to make data visible.We propose SwitchDelta to…
Memory-compute disaggregation promises transparent elasticity, high utilization and balanced usage for resources in data centers by physically separating memory and compute into network-attached resource "blades". However, existing designs…
Many distributed applications adopt a partition/aggregation pattern to achieve high performance and scalability. The aggregation process, which usually takes a large portion of the overall execution time, incurs large amount of network…
Distributed multi-controller deployment is a promising method to achieve a scalable and reliable control plane of Software-Defined Networking (SDN). However, it brings a new challenge for balancing loads on the distributed controllers as…
As the composition of the power grid evolves to integrate more renewable generation, its reliance on distributed energy resources (DER) is increasing. Existing DERs are often controlled with proportional integral (PI) controllers that, if…
There has been an explosion of interest in designing high-performance Transformers. While Transformers have delivered significant performance improvements, training such networks is extremely memory intensive owing to storing all…
In a distributed computing system operating according to the map-shuffle-reduce framework, coding data prior to storage can be useful both to reduce the latency caused by straggling servers and to decrease the inter-server communication…
Most conventional Federated Learning (FL) models are using a star network topology where all users aggregate their local models at a single server (e.g., a cloud server). That causes significant overhead in terms of both communications and…
Distributed cloud environments hosting data-intensive applications often experience slowdowns due to network congestion, asymmetric bandwidth, and inter-node data shuffling. These factors are typically not captured by traditional host-level…
Test-Time Adaptation (TTA) has emerged as an effective solution for adapting Vision Transformers (ViT) to distribution shifts without additional training data. However, existing TTA methods often incur substantial computational overhead,…
The inference of Neural Networks is usually restricted by the resources (e.g., computing power, memory, bandwidth) on edge devices. In addition to improving the hardware design and deploying efficient models, it is possible to aggregate the…
Distributed computing has become a common practice nowadays, where the recent focus has been given to the usage of smart networking devices with in-network computing capabilities. State-of-the-art switches with near-line rate computing and…
Recent years have witnessed increasing interest in machine learning inferences on serverless computing for its auto-scaling and cost effective properties. Existing serverless computing, however, lacks effective job scheduling methods to…
Input data preprocessing is a common bottleneck when concurrently training multimedia machine learning (ML) models in modern systems. To alleviate these bottlenecks and reduce the training time for concurrent jobs, we present Seneca, a data…
Transformers have become the foundation of numerous state-of-the-art AI models across diverse domains, thanks to their powerful attention mechanism for modeling long-range dependencies. However, the quadratic scaling complexity of attention…
The rapid growth of data-intensive applications such as generative AI, scientific simulations, and large-scale analytics is driving modern supercomputers and data centers toward increasingly heterogeneous and tightly integrated…
The increasing complexity of transformer models in artificial intelligence expands their computational costs, memory usage, and energy consumption. Hardware acceleration tackles the ensuing challenges by designing processors and…
In this paper, we show how to achieve close-to-optimal utility performance in energy harvesting networks with only finite capacity energy storage devices. In these networks, nodes are capable of harvesting energy from the environment. The…