Related papers: A Dynamic Elimination-Combining Stack Algorithm
We present a new blocking linearizable stack implementation which utilizes sharding and fetch&increment to achieve significantly better performance than all existing concurrent stacks. The proposed implementation is based on a novel…
Work-stealing is a widely used technique for balancing irregular parallel workloads, and most modern runtime systems adopt lock-free work-stealing deques to reduce contention and improve scalability. However, existing algorithms are…
The sequential semantics of many concurrent data structures, such as stacks and queues, inevitably lead to memory contention in parallel environments, thus limiting scalability. Semantic relaxation has the potential to address this issue,…
Parallel batched data structures are designed to process synchronized batches of operations in a parallel computing model. In this paper, we propose parallel combining, a technique that implements a concurrent data structure from a parallel…
Synchronization is the major obstacle to scalability in distributed computing. Concurrent operations on the shared data engage in synchronization when they encounter a \emph{conflict}, i.e., their effects depend on the order in which they…
There has been a significant amount of work in the literature proposing semantic relaxation of concurrent data structures for improving scalability and performance. By relaxing the semantics of a data structure, a bigger design space, that…
Asynchronous parallel optimization algorithms for solving large-scale machine learning problems have drawn significant attention from academia to industry recently. This paper proposes a novel algorithm, decoupled asynchronous proximal…
The recent advancements in multicore machines highlight the need to simplify concurrent programming in order to leverage their computational power. One way to achieve this is by designing efficient concurrent data structures (e.g. stacks,…
The close integration of increasing operating units into an integrated energy system (IES) results in complex interconnections between these units. The strong dynamic interactions create barriers to designing a successful distributed…
We propose an asynchronous, decentralized algorithm for consensus optimization. The algorithm runs over a network in which the agents communicate with their neighbors and perform local computation. In the proposed algorithm, each agent can…
Communication scheduling aims to reduce communication bottlenecks in data parallel training (DP) by maximizing the overlap between computation and communication. However, existing schemes fall short due to three main issues: (1) hard data…
We study the performance power of software combining in designing persistent algorithms and data structures. We present Bcomb, a new blocking highly-efficient combining protocol, and built upon it to get PBcomb, a persistent version of it…
Comparative evaluation of several systems is a recurrent task in researching. It is a key step before deciding which system to use for our work, or, once our research has been conducted, to demonstrate the potential of the resulting model.…
The dynamic provisioning of virtualized resources offered by cloud computing infrastructures allows applications deployed in a cloud environment to automatically increase and decrease the amount of used resources. This capability is called…
Stochastic algorithms are efficient approaches to solving machine learning and optimization problems. In this paper, we propose a general framework called Splash for parallelizing stochastic algorithms on multi-node distributed systems.…
We introduce a new, high-throughput, synchronous, distributed, data-parallel, stochastic-gradient-descent learning algorithm. This algorithm uses amortized inference in a compute-cluster-specific, deep, generative, dynamical model to…
Maintaining a dynamic $k$-core decomposition is an important problem that identifies dense subgraphs in dynamically changing graphs. Recent work by Liu et al. [SPAA 2022] presents a parallel batch-dynamic algorithm for maintaining an…
Data-structure dynamization is a general approach for making static data structures dynamic. It is used extensively in geometric settings and in the guise of so-called merge (or compaction) policies in big-data databases such as Google…
Modern concurrent programming benefits from a large variety of synchronization techniques. These include conventional pessimistic locking, as well as optimistic techniques based on conditional synchronization primitives or transactional…
We propose a decomposition framework for the parallel optimization of the sum of a differentiable (possibly nonconvex) function and a (block) separable nonsmooth, convex one. The latter term is usually employed to enforce structure in the…