Related papers: Highly-Efficient Persistent FIFO Queues
FIFO queues are a fundamental data structure used in a wide range of applications. Concurrent FIFO queues allow multiple execution threads to access the queue simultaneously. Maintaining strict FIFO semantics in concurrent queues leads to…
Flat combining (FC) is a synchronization paradigm in which a single thread, holding a global lock, collects requests by multiple threads for accessing a concurrent data structure and applies their combined requests to it. Although FC is…
Non-volatile memory (NVM) promises persistent main memory that remains correct despite loss of power. This has sparked a line of research into algorithms that can recover from a system crash. Since caches are expected to remain volatile,…
We present a new lock-free multiple-producer and multiple-consumer (MPMC) FIFO queue design which is scalable and, unlike existing high-performant queues, very memory efficient. Moreover, the design is ABA safe and does not require any…
We study the performance power of software combining in designing persistent algorithms and data structures. We present Bcomb, a new blocking highly-efficient combining protocol, and built upon it to get PBcomb, a persistent version of it…
Many concurrent algorithms require processes to perform fetch-and-add operations on a single memory location, which can be a hot spot of contention. We present a novel algorithm called Aggregating Funnels that reduces this contention by…
We explore the problem of efficiently implementing shared data structures in an asynchronous computing environment. We start with a traditional FIFO queue, showing that full replication is possible with a delay of only a single round-trip…
Resilience is a major design goal for HPC. Checkpoint is the most common method to enable resilient HPC. Checkpoint periodically saves critical data objects to non-volatile storage to enable data persistence. However, using checkpoint, we…
In this article, we propose a technique to accelerate nonvolatile or hybrid of volatile and nonvolatile processor cache design space exploration for application specific embedded systems. Utilizing a novel cache behavior modeling equation…
We present a deterministic oblivious LIFO (Stack), FIFO, double-ended and double-ended priority queue as well as an oblivious mergesort and quicksort algorithm. Our techniques and ideas include concatenating queues end-to-end, size…
Motivated by the observation that FIFO-based push-relabel algorithms are able to outperform highest label-based variants on modern, large maximum flow problem instances, we introduce an efficient implementation of the algorithm that uses…
HPC applications pose high demands on I/O performance and storage capability. The emerging non-volatile memory (NVM) techniques offer low-latency, high bandwidth, and persistence for HPC applications. However, the existing I/O stack are…
In order to prevent velocity, pressure, and temperature spikes at material discontinuities occurring when the interface-capturing schemes inconsistently simulate compressible multi-material flows(when the specific heats ratio is…
The advent of non-volatile memory (NVM) technologies like PCM, STT, memristors and Fe-RAM is believed to enhance the system performance by getting rid of the traditional memory hierarchy by reducing the gap between memory and storage. This…
Practical applicability of quantum optimisation on near term devices is constrained by limited qubit counts and hardware noise, which restricts the scalability of quantum optimisation algorithms for combinatorial problems. The simulation of…
We present a federated, asynchronous, memory-limited algorithm for online task scheduling across large-scale networks of hundreds of workers. This is achieved through recent advancements in federated edge computing that unlocks the ability…
Concurrent priority queues are widely used in important workloads, such as graph applications and discrete event simulations. However, designing scalable concurrent priority queues for NUMA architectures is challenging. Even though several…
Future architectures designed to deliver exascale performance motivate the need for novel algorithmic changes in order to fully exploit their capabilities. In this paper, the performance of several numerical algorithms, characterised by…
In this article, hybrid parallel bidirectional sieve method is implemented by SMP Cluster, the individual computational units joined together by the communication network, are usually shared-memory systems with one or more multicore…
Closed queuing networks with finite capacity buffers and skip-over policies are fundamental models in the performance evaluation of computer and communication systems. This technical report presents the details of computational algorithms…