Related papers: Templating Shuffles

Flex-TPU: A Flexible TPU with Runtime Reconfigurable Dataflow Architecture

Tensor processing units (TPUs) are one of the most well-known machine learning (ML) accelerators utilized at large scale in data centers as well as in tiny ML applications. TPUs offer several improvements and advantages over conventional ML…

Hardware Architecture · Computer Science 2024-07-12 Mohammed Elbtity , Peyton Chandarana , Ramtin Zand

Tensor Memory Engine: On-the-fly Data Reorganization for Ideal Locality

The shift to data-intensive processing from the cloud to the edge has introduced new challenges and expectations for the next generation of intelligent computing systems. As the memory wall continues to grow, modern systems can only meet…

Hardware Architecture · Computer Science 2026-04-16 Denis Hoornaert , Cole Strickler , Manos Athanassoulis , Marco Caccamo , Heechul Yun , Renato Mancuso

FuxiShuffle: An Adaptive and Resilient Shuffle Service for Distributed Data Processing on Alibaba Cloud

Shuffle exchanges intermediate results between upstream and downstream operators in distributed data processing and is usually the bottleneck due to factors such as small random I/Os and network contention. Several systems have been…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-27 Yuhao Lin , Zhipeng Tang , Jiayan Tong , Junqing Xiao , Bin Lu , Yuhang Li , Chao Li , Zhiguo Zhang , Junhua Wang , Hao Luo , James Cheng , Chuang Hu , Jiawei Jiang , Xiao Yan

TensorFlow: A system for large-scale machine learning

TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-01 Martín Abadi , Paul Barham , Jianmin Chen , Zhifeng Chen , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Geoffrey Irving , Michael Isard , Manjunath Kudlur , Josh Levenberg , Rajat Monga , Sherry Moore , Derek G. Murray , Benoit Steiner , Paul Tucker , Vijay Vasudevan , Pete Warden , Martin Wicke , Yuan Yu , Xiaoqiang Zheng

Performance Optimization for Edge-Cloud Serverless Platforms via Dynamic Task Placement

We present a framework for performance optimization in serverless edge-cloud platforms using dynamic task placement. We focus on applications for smart edge devices, for example, smart cameras or speakers, that need to perform processing…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-21 Anirban Das , Shigeru Imai , Mike P. Wittie , Stacy Patterson

Optimal Service Elasticity in Large-Scale Distributed Systems

A fundamental challenge in large-scale cloud networks and data centers is to achieve highly efficient server utilization and limit energy consumption, while providing excellent user-perceived performance in the presence of uncertain and…

Probability · Mathematics 2017-06-23 Debankur Mukherjee , Souvik Dhara , Sem Borst , Johan S. H. van Leeuwaarden

Exoshuffle: An Extensible Shuffle Architecture

Shuffle is one of the most expensive communication primitives in distributed data processing and is difficult to scale. Prior work addresses the scalability challenges of shuffle by building monolithic shuffle systems. These systems are…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-21 Frank Sifei Luan , Stephanie Wang , Samyukta Yagati , Sean Kim , Kenneth Lien , Isaac Ong , Tony Hong , SangBin Cho , Eric Liang , Ion Stoica

Neural-based Modeling for Performance Tuning of Spark Data Analytics

Cloud data analytics has become an integral part of enterprise business operations for data-driven insight discovery. Performance modeling of cloud data analytics is crucial for performance tuning and other critical operations in the cloud.…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-01-21 Khaled Zaouk , Fei Song , Chenghao Lyu , Yanlei Diao

A Relative Study of Task Scheduling Algorithms in Cloud Computing Environment

Cloud Computing is a paradigm of both parallel processing and distributed computing. It offers computing facilities as a utility service in pay as par use manner. Virtualization, self service provisioning, elasticity and pay per use are the…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-12-20 Syed Arshad Ali , Mansaf Alam

Optimizing Distributed Networking with Big Data Scheduling and Cloud Computing

With the rapid transformation of computer hardware and algorithms, mobile networking has evolved from low data carrying capacity and high latency to better-optimized networks, either by enhancing the digital network or using different…

Networking and Internet Architecture · Computer Science 2023-11-09 Wenbo Zhu

Adaptable TeaStore

Modern cloud-native systems require adapting dynamically to changing operational conditions, including service outages, traffic surges, and evolving user requirements. While existing benchmarks provide valuable testbeds for performance and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-30 Simon Bliudze , Giuseppe De Palma , Saverio Giallorenzo , Ivan Lanese , Gianluigi Zavattaro , Brice Arléon Zemtsop Ndadji

Towards Stochastically Optimizing Data Computing Flows

With rapid growth in the amount of unstructured data produced by memory-intensive applications, large scale data analytics has recently attracted increasing interest. Processing, managing and analyzing this huge amount of data poses several…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-08-29 Farshid Farhat , Diman Zad Tootaghaj , Mohammad Arjomand

Jiagu: Optimizing Serverless Computing Resource Utilization with Harmonized Efficiency and Practicability

Current serverless platforms struggle to optimize resource utilization due to their dynamic and fine-grained nature. Conventional techniques like overcommitment and autoscaling fall short, often sacrificing utilization for practicability or…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-03-04 Qingyuan Liu , Yanning Yang , Dong Du , Yubin Xia , Ping Zhang , Jia Feng , James Larus , Haibo Chen

Tetris: An SLA-aware Application Placement Strategy in the Edge-Cloud Continuum

An Edge-Cloud Continuum integrates edge and cloud resources to provide a flexible and scalable infrastructure. This paradigm can minimize latency by processing data closer to the source at the edge while leveraging the vast computational…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-04 Lucas Almeida , Maycon Peixoto

Push Down Optimization for Distributed Multi Cloud Data Integration

Enterprises increasingly adopt multi cloud architectures to take advantage of diverse database engines, regional availability, and cost models. In these environments, ETL pipelines must process large, distributed datasets while minimizing…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-27 Ravi Kiran Kodali , Vinoth Punniyamoorthy , Akash Kumar Agarwal , Bikesh Kumar , Balakrishna Pothineni , Aswathnarayan Muthukrishnan Kirubakaran , Sumit Saha , Nachiappan Chockalingam

JASPER: Joint Optimization of Scaling, Placement, and Routing of Virtual Network Services

To adapt to continuously changing workloads in networks, components of the running network services may need to be replicated (scaling the network service) and allocated to physical resources (placement) dynamically, also necessitating…

Networking and Internet Architecture · Computer Science 2018-06-15 Sevil Dräxler , Holger Karl , Zoltán Ádám Mann

Space Shuffle: A Scalable, Flexible, and High-Bandwidth Data Center Network

Data center applications require the network to be scalable and bandwidth-rich. Current data center network architectures often use rigid topologies to increase network bandwidth. A major limitation is that they can hardly support…

Networking and Internet Architecture · Computer Science 2017-11-23 Ye Yu , Chen Qian

NetClus: A Scalable Framework for Locating Top-K Sites for Placement of Trajectory-Aware Services

Facility location queries identify the best locations to set up new facilities for providing service to its users. Majority of the existing works in this space assume that the user locations are static. Such limitations are too restrictive…

Databases · Computer Science 2017-04-13 Shubhadip Mitra , Priya Saraf , Richa Sharma , Arnab Bhattacharya , Harsh Bhandari , Sayan Ranu

Making Serverless Computing Extensible: A Case Study of Serverless Data Analytics

Serverless computing has attracted a broad range of applications due to its ease of use and resource elasticity. However, developing serverless applications often poses a dilemma -- relying on general-purpose serverless platforms can fall…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-17 Minchen Yu , Yinghao Ren , Jiamu Zhao , Jiaqi Li

Execution Templates: Caching Control Plane Decisions for Strong Scaling of Data Analytics

Control planes of cloud frameworks trade off between scheduling granularity and performance. Centralized systems schedule at task granularity, but only schedule a few thousand tasks per second. Distributed systems schedule hundreds of…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-05-05 Omid Mashayekhi , Hang Qu , Chinmayee Shah , Philip Levis