Related papers: Distributed Phasers
We address the problem of statically checking safety properties (such as assertions or deadlocks) for parameterized phaser programs. Phasers embody a non-trivial and modern synchronization construct used to orchestrate executions of…
We present and compare distributed parallelization strategies for the particle-in-Fourier (PIF) schemes used in kinetic plasma simulations. The different strategies are i) domain decomposition, where both the particles and Fourier modes are…
With the growing model size, deep neural networks (DNN) are increasingly trained over massive GPU accelerators, which demands a proper parallelization plan that transforms a DNN model into fine-grained tasks and then schedules them to GPUs…
Real-time scheduling and locking protocols are fundamental facilities to construct time-critical systems. For parallel real-time tasks, predictable locking protocols are required when concurrent sub-jobs mutually exclusive access to shared…
Arrival of multicore systems has enforced a new scenario in computing, the parallel and distributed algorithms are fast replacing the older sequential algorithms, with many challenges of these techniques. The distributed algorithms provide…
Phasers pose an interesting synchronization mechanism that generalizes many collective synchronization patterns seen in parallel programming languages, including barriers, clocks, and point-to-point synchronization using latches or…
We describe a structured system for distributed mechanism design. It consists of a sequence of layers. The lower layers deal with the operations relevant for distributed computing only, while the upper layers are concerned only with…
Current high-performance computer systems used for scientific computing typically combine shared memory computational nodes in a distributed memory environment. Extracting high performance from these complex systems requires tailored…
We propose an asynchronous iterative scheme that allows a set of interconnected nodes to distributively reach an agreement within a pre-specified bound in a finite number of steps. While this scheme could be adopted in a wide variety of…
We present \code{phaser}, an open-source Python package that provides a unified interface to both conventional and gradient descent-based ptychographic algorithms. Features such as mixed-state probe, probe position correction, and…
In this study, we analyze the feasibility of distributed phase synchronization for coherent signal superposition, a fundamental enabler for paradigms such as coherent over-the-air computation (OAC), distributed beamforming, and interference…
We discuss the distributed matching scheme in accelerators where control of transverse beam phase space, oscillation, and transport is accomplished by flexible distribution of focusing elements beyond dedicated matching sections. Besides…
Large-scale neuromorphic architectures consist of computing tiles that communicate spikes using a shared interconnect. The communication patterns in these systems are inherently sparse, asynchronous, and localized, as neural activity is…
A parallel computer system is a collection of processing elements that communicate and cooperate to solve large computational problems efficiently. To achieve this, at first the large computational problem is partitioned into several tasks…
Modern large-scale scientific applications consist of thousands to millions of individual tasks. These tasks involve not only computation but also communication with one another. Typically, the communication pattern between tasks is sparse…
The ability to express a program as a hierarchical composition of parts is an essential tool in managing the complexity of software and a key abstraction this provides is to separate the representation of data from the computation. Many…
With the increasing demand for large-scale training of machine learning models, consensus-based distributed optimization methods have recently been advocated as alternatives to the popular parameter server framework. In this paradigm, each…
Shared resource interference is observed by applications as dynamic performance asymmetry. Prior art has developed approaches to reduce the impact of performance asymmetry mainly at the operating system and architectural levels. In this…
We present an approach which enables to identify phase synchronization in coupled chaotic oscillators without having to explicitly measure the phase. We show that if one defines a typical event in one oscillator and then observes another…
To support parallelizable serverless workflows in applications like media processing, we have prototyped a distributed scheduler called Raptor that reduces both the end-to-end delay time and failure rate of parallelizable serverless…