Related papers: Tracing Distributed Algorithms Using Replay Clocks
In this work, we focus on the problem of replay clocks (RepCL). The need for replay clocks arises from the observation that analyzing distributed computation for all desired properties of interest may not be feasible in an online…
One typical use case of large-scale distributed computing in data centers is to decompose a computation job into many independent tasks and run them in parallel on different machines, sometimes known as the "embarrassingly parallel"…
Causality in distributed systems is a concept that has long been explored and numerous approaches have been made to use causality as a way to trace distributed system execution. Traditional approaches usually used system profiling and newer…
We present Cyclotron, a framework and compiler for using recurrence equations to express streaming dataflow algorithms, which then get portably compiled to distributed topologies of interlinked processors. Our framework provides an input…
Real-time scheduling and locking protocols are fundamental facilities to construct time-critical systems. For parallel real-time tasks, predictable locking protocols are required when concurrent sub-jobs mutually exclusive access to shared…
Task graphs provide a simple way to describe scientific workflows (sets of tasks with dependencies) that can be executed on both HPC clusters and in the cloud. An important aspect of executing such graphs is the used scheduling algorithm.…
Querying graph data with low latency is an important requirement in application domains such as social networks and knowledge graphs. Graph queries perform multiple hops between vertices. When data is partitioned and stored across multiple…
Locks have been widely used as an effective synchronization mechanism among processes and threads. However, we observe that a large number of false inter-thread dependencies (i.e., unnecessary lock contentions) exist during the program…
In a cloud computing job with many parallel tasks, the tasks on the slowest machines (straggling tasks) become the bottleneck in the job completion. Computing frameworks such as MapReduce and Spark tackle this by replicating the straggling…
The efficient parallel execution of complex computations requires balancing the workload across processors while minimizing the communication between them. This inherent trade-off is often captured by graph partitioning or DAG scheduling…
After all these years and all these other shared memory programming frameworks, OpenMP is still the most popular one. However, its greater levels of non-deterministic execution makes debugging and testing more challenging. The ability to…
Parallel real-time embedded applications can be modelled as directed acyclic graphs (DAGs) whose nodes model subtasks and whose edges model precedence constraints among subtasks. Efficiently scheduling such parallel tasks can be challenging…
Online continual learning (OCL) refers to the ability of a system to learn over time from a continuous stream of data without having to revisit previously encountered training samples. Learning continually in a single data pass is crucial…
We revisit existing linear computation coding (LCC) algorithms, and introduce a new framework that measures the computational cost of computing multidimensional linear functions, not only in terms of the number of additions, but also with…
As computing power is becoming the core productivity of the digital economy era, the concept of Computing and Network Convergence (CNC), under which network and computing resources can be dynamically scheduled and allocated according to…
Replay methods are known to be successful at mitigating catastrophic forgetting in continual learning scenarios despite having limited access to historical data. However, storing historical data is cheap in many real-world settings, yet…
Short TCP flows that are critical for many interactive applications in data centers are plagued by large flows and head-of-line blocking in switches. Hash-based load balancing schemes such as ECMP aggravate the matter and result in…
This document is focused on computing systems implemented in technologies that communicate and compute with temporal transients. Although described in general terms, implementations of spiking neural networks are of primary interest. As…
Traditional parallel schedulers running on cluster supercomputers support only static scheduling, where the number of processors allocated to an application remains fixed throughout the execution of the job. This results in…
We consider a parallel system of $m$ identical machines prone to unpredictable crashes and restarts, trying to cope with the continuous arrival of tasks to be executed. Tasks have different computational requirements (i.e., processing time…