English
Related papers

Related papers: Efficient Data-Plane Memory Scheduling for In-Netw…

200 papers

Training machine learning models in parallel is an increasingly important workload. We accelerate distributed parallel training by designing a communication primitive that uses a programmable switch dataplane to execute a key step of the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-01 Amedeo Sapio , Marco Canini , Chen-Yu Ho , Jacob Nelson , Panos Kalnis , Changhoon Kim , Arvind Krishnamurthy , Masoud Moshref , Dan R. K. Ports , Peter Richtárik

The impact of transformer networks is booming, yet, they come with significant computational complexity. It is therefore essential to understand how to optimally map and execute these networks on modern neural processor hardware. So far,…

Hardware Architecture · Computer Science 2024-06-17 Steven Colleman , Arne Symons , Victor J. B. Jung , Marian Verhelst

Distributed storage systems typically maintain strong consistency between data nodes and metadata nodes by adopting ordered writes: 1) first installing data; 2) then updating metadata to make data visible.We propose SwitchDelta to…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-26 Junru Li , Qing Wang , Zhe Yang , Shuo Liu , Jiwu Shu , Youyou Lu

Memory-compute disaggregation promises transparent elasticity, high utilization and balanced usage for resources in data centers by physically separating memory and compute into network-attached resource "blades". However, existing designs…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-22 Seung-seob Lee , Yanpeng Yu , Yupeng Tang , Anurag Khandelwal , Lin Zhong , Abhishek Bhattacharjee

Many distributed applications adopt a partition/aggregation pattern to achieve high performance and scalability. The aggregation process, which usually takes a large portion of the overall execution time, incurs large amount of network…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-04-09 Fan Yang , Zhan Wang , Xiaoxiao Ma , Guojun Yuan , Xuejun An

Distributed multi-controller deployment is a promising method to achieve a scalable and reliable control plane of Software-Defined Networking (SDN). However, it brings a new challenge for balancing loads on the distributed controllers as…

Networking and Internet Architecture · Computer Science 2018-01-29 Tao Hu , Julong Lan , Jianhui Zhang , Wei Zhao

As the composition of the power grid evolves to integrate more renewable generation, its reliance on distributed energy resources (DER) is increasing. Existing DERs are often controlled with proportional integral (PI) controllers that, if…

Systems and Control · Electrical Eng. & Systems 2024-05-14 Milad Beikbabaei , Brady Alexander , Ashwin Venkataramanan , Ali Mehrizi-Sani

There has been an explosion of interest in designing high-performance Transformers. While Transformers have delivered significant performance improvements, training such networks is extremely memory intensive owing to storing all…

Computer Vision and Pattern Recognition · Computer Science 2022-08-30 Zizheng Pan , Peng Chen , Haoyu He , Jing Liu , Jianfei Cai , Bohan Zhuang

In a distributed computing system operating according to the map-shuffle-reduce framework, coding data prior to storage can be useful both to reduce the latency caused by straggling servers and to decrease the inter-server communication…

Information Theory · Computer Science 2018-08-22 Jingjing Zhang , Osvaldo Simeone

Most conventional Federated Learning (FL) models are using a star network topology where all users aggregate their local models at a single server (e.g., a cloud server). That causes significant overhead in terms of both communications and…

Information Theory · Computer Science 2022-06-30 Thinh Quang Dinh , Diep N. Nguyen , Dinh Thai Hoang , Pham Tran Vu , Eryk Dutkiewicz

Distributed cloud environments hosting data-intensive applications often experience slowdowns due to network congestion, asymmetric bandwidth, and inter-node data shuffling. These factors are typically not captured by traditional host-level…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-21 Sankalpa Timilsina , Susmit Shannigrahi

Test-Time Adaptation (TTA) has emerged as an effective solution for adapting Vision Transformers (ViT) to distribution shifts without additional training data. However, existing TTA methods often incur substantial computational overhead,…

Computer Vision and Pattern Recognition · Computer Science 2025-08-07 Yizhe Xiong , Zihan Zhou , Yiwen Liang , Hui Chen , Zijia Lin , Tianxiang Hao , Fan Zhang , Jungong Han , Guiguang Ding

The inference of Neural Networks is usually restricted by the resources (e.g., computing power, memory, bandwidth) on edge devices. In addition to improving the hardware design and deploying efficient models, it is possible to aggregate the…

Machine Learning · Computer Science 2021-11-05 Jun-Liang Lin , Sheng-De Wang

Distributed computing has become a common practice nowadays, where the recent focus has been given to the usage of smart networking devices with in-network computing capabilities. State-of-the-art switches with near-line rate computing and…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-01-13 Raz Segal , Chen Avin , Gabriel Scalosub

Recent years have witnessed increasing interest in machine learning inferences on serverless computing for its auto-scaling and cost effective properties. Existing serverless computing, however, lacks effective job scheduling methods to…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-26 Xinning Hui , Yuanchao Xu , Zhishan Guo , Xipeng Shen

Input data preprocessing is a common bottleneck when concurrently training multimedia machine learning (ML) models in modern systems. To alleviate these bottlenecks and reduce the training time for concurrent jobs, we present Seneca, a data…

Operating Systems · Computer Science 2025-11-19 Omkar Desai , Ziyang Jiao , Shuyi Pei , Janki Bhimani , Bryan S. Kim

Transformers have become the foundation of numerous state-of-the-art AI models across diverse domains, thanks to their powerful attention mechanism for modeling long-range dependencies. However, the quadratic scaling complexity of attention…

Hardware Architecture · Computer Science 2026-01-29 Zhenkun Fan , Zishen Wan , Che-Kai Liu , Ashwin Sanjay Lele , Win-San Khwa , Bo Zhang , Meng-Fan Chang , Arijit Raychowdhury

The rapid growth of data-intensive applications such as generative AI, scientific simulations, and large-scale analytics is driving modern supercomputers and data centers toward increasingly heterogeneous and tightly integrated…

The increasing complexity of transformer models in artificial intelligence expands their computational costs, memory usage, and energy consumption. Hardware acceleration tackles the ensuing challenges by designing processors and…

Hardware Architecture · Computer Science 2023-12-21 Alireza Amirshahi , Giovanni Ansaloni , David Atienza

In this paper, we show how to achieve close-to-optimal utility performance in energy harvesting networks with only finite capacity energy storage devices. In these networks, nodes are capable of harvesting energy from the environment. The…

Optimization and Control · Mathematics 2010-12-10 Longbo Huang , Michael J. Neely
‹ Prev 1 2 3 10 Next ›