Related papers: Integrative Dynamic Reconfiguration in a Parallel …
A parallel computer system is a collection of processing elements that communicate and cooperate to solve large computational problems efficiently. To achieve this, at first the large computational problem is partitioned into several tasks…
We address the joint optimization of multiple stream joins in a scale-out architecture by tailoring prior work on multi-way stream joins to predicate-driven data partitioning schemes. We present an integer linear programming (ILP)…
In parallel iterative applications, computational efficiency is essential for addressing large problems. Load imbalance is one of the major performance degradation factors of parallel applications. Therefore, distributing, cleverly, and as…
Many modern applications require real-time processing of large volumes of high-speed data. Such data processing needs can be modeled as a streaming computation. A streaming computation is specified as a dataflow graph that exposes multiple…
The paper introduces PDSP-Bench, a novel benchmarking system designed for a systematic understanding of performance of parallel stream processing in a distributed environment. Such an understanding is essential for determining how Stream…
Data and pipeline parallelism are key strategies for scaling neural network training across distributed devices, but their high communication cost necessitates co-located computing clusters with fast interconnects, limiting their…
Many organizations routinely analyze large datasets using systems for distributed data-parallel processing and clusters of commodity resources. Yet, users need to configure adequate resources for their data processing jobs. This requires…
Hybrid parallelism techniques are essential for efficiently training large language models (LLMs). Nevertheless, current automatic parallel planning frameworks often overlook the simultaneous consideration of node heterogeneity and dynamic…
We study the problem of scheduling a general computational DAG on multiple processors in a 2-level memory hierarchy. This setting is a natural generalization of several prominent models in the literature, and it simultaneously captures…
Parallel multiphysics simulations often suffer from load imbalances originating from the applied coupling of algorithms with spatially and temporally varying workloads. It is thus desirable to minimize these imbalances to reduce the time to…
We present here a cost effective framework for a robust scalable and distributed job processing system that adapts to the dynamic computing needs easily with efficient load balancing for heterogeneous systems. The design is such that each…
Furthering our understanding of many of today's interesting problems in plasma physics---including plasma based acceleration and magnetic reconnection with pair production due to quantum electrodynamic effects---requires large-scale kinetic…
In this paper, to analyze end-to-end timing behavior in heterogeneous processor and network environments accurately, we adopt a heterogeneous selection value on communication contention (HSV_CC) algorithm, which can synchronize tasks and…
Stream processing is a computing paradigm that supports real-time data processing for a wide variety of applications. At Meta, it's used across the company for various tasks such as deriving product insights, providing and improving user…
Parallel computing is very important to accelerate the performance of software systems. Additionally, considering that a recurring challenge is to process high data volumes continuously, stream processing emerged as a paradigm and software…
In this paper we consider the operator mapping problem for in-network stream processing applications. In-network stream processing consists in applying a tree of operators in steady-state to multiple data objects that are continually…
We consider the problem of intelligent and efficient task allocation mechanism in large-scale mobile edge computing (MEC), which can reduce delay and energy consumption in a parallel and distributed optimization. In this paper, we study the…
Developing an efficient server-based real-time scheduling solution that supports dynamic task-level parallelism is now relevant to even the desktop and embedded domains and no longer only to the high performance computing market niche. This…
Key-based workload partitioning is a common strategy used in parallel stream processing engines, enabling effective key-value tuple distribution over worker threads in a logical operator. While randomized hashing on the keys is capable of…
Operating a distributed data stream processing workload efficiently at scale is hard. The operator of the workload must parallelize and lay out tasks of the workload with resources that match the requirement of target data rate. The…