Related papers: Highly-Efficient Persistent FIFO Queues

BlockFIFO & MultiFIFO: Scalable Relaxed Queues

FIFO queues are a fundamental data structure used in a wide range of applications. Concurrent FIFO queues allow multiple execution threads to access the queue simultaneously. Maintaining strict FIFO semantics in concurrent queues leads to…

Data Structures and Algorithms · Computer Science 2025-10-17 Stefan Koch , Peter Sanders , Marvin Williams

Flat-Combining-Based Persistent Data Structures for Non-Volatile Memory

Flat combining (FC) is a synchronization paradigm in which a single thread, holding a global lock, collects requests by multiple threads for accessing a concurrent data structure and applies their combined requests to it. Although FC is…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-12-10 Matan Rusanovsky , Hagit Attiya , Ohad Ben-Baruch , Tom Gerby , Danny Hendler , Pedro Ramalhete

Delay-Free Concurrency on Faulty Persistent Memory

Non-volatile memory (NVM) promises persistent main memory that remains correct despite loss of power. This has sparked a line of research into algorithms that can recover from a system crash. Since caches are expected to remain volatile,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-22 Naama Ben-David , Guy E. Blelloch , Michal Friedman , Yuanhao Wei

A Scalable, Portable, and Memory-Efficient Lock-Free FIFO Queue

We present a new lock-free multiple-producer and multiple-consumer (MPMC) FIFO queue design which is scalable and, unlike existing high-performant queues, very memory efficient. Moreover, the design is ABA safe and does not require any…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-08-14 Ruslan Nikolaev

Persistent Software Combining

We study the performance power of software combining in designing persistent algorithms and data structures. We present Bcomb, a new blocking highly-efficient combining protocol, and built upon it to get PBcomb, a persistent version of it…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-27 Panagiota Fatourou , Nikolaos D. Kallimanis , Eleftherios Kosmas

Aggregating Funnels for Faster Fetch&Add and Queues

Many concurrent algorithms require processes to perform fetch-and-add operations on a single memory location, which can be a hot spot of contention. We present a novel algorithm called Aggregating Funnels that reduces this contention by…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-04 Younghun Roh , Yuanhao Wei , Eric Ruppert , Panagiota Fatourou , Siddhartha Jayanti , Julian Shun

Relaxation for Efficient Asynchronous Queues

We explore the problem of efficiently implementing shared data structures in an asynchronous computing environment. We start with a traditional FIFO queue, showing that full replication is possible with a delay of only a single round-trip…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-05 Samuel Baldwin , Cole Hausman , Mohamed Bakr , Edward Talmage

High Performance Data Persistence in Non-Volatile Memory for Resilient High Performance Computing

Resilience is a major design goal for HPC. Checkpoint is the most common method to enable resilient HPC. Checkpoint periodically saves critical data objects to non-volatile storage to enable data persistence. However, using checkpoint, we…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-05-03 Yingchao Huang , Kai Wu , Dong Li

Accelerating Non-volatile/Hybrid Processor Cache Design Space Exploration for Application Specific Embedded Systems

In this article, we propose a technique to accelerate nonvolatile or hybrid of volatile and nonvolatile processor cache design space exploration for application specific embedded systems. Utilizing a novel cache behavior modeling equation…

Hardware Architecture · Computer Science 2015-09-01 Mohammad Shihabul Haque , Ang Li , Akash Kumar , Qingsong Wei

Oblivious Sorting and Queues

We present a deterministic oblivious LIFO (Stack), FIFO, double-ended and double-ended priority queue as well as an oblivious mergesort and quicksort algorithm. Our techniques and ideas include concatenating queues end-to-end, size…

Data Structures and Algorithms · Computer Science 2016-12-13 Johannes Schneider

Efficient Implementation of a Synchronous Parallel Push-Relabel Algorithm

Motivated by the observation that FIFO-based push-relabel algorithms are able to outperform highest label-based variants on modern, large maximum flow problem instances, we introduce an efficient implementation of the algorithm that uses…

Data Structures and Algorithms · Computer Science 2015-07-27 Niklas Baumstark , Guy Blelloch , Julian Shun

Performance Evaluation and Modeling of HPC I/O on Non-Volatile Memory

HPC applications pose high demands on I/O performance and storage capability. The emerging non-volatile memory (NVM) techniques offer low-latency, high bandwidth, and persistence for HPC applications. However, the existing I/O stack are…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-05-11 Wei Liu , Kai Wu , Jialin Liu , Feng Chen , Dong Li

Consistent implementation of characteristic flux-split based finite difference method for compressible multi-material flows

In order to prevent velocity, pressure, and temperature spikes at material discontinuities occurring when the interface-capturing schemes inconsistently simulate compressible multi-material flows(when the specific heats ratio is…

Computational Physics · Physics 2020-12-29 Zhiwei He , Yousheng Zhang , Li Li , Baolin Tian

Persistent Memory Programming Abstractions in Context of Concurrent Applications

The advent of non-volatile memory (NVM) technologies like PCM, STT, memristors and Fe-RAM is believed to enhance the system performance by getting rid of the traditional memory hierarchy by reducing the gap between memory and storage. This…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-12-15 Ajay Singh , Marc Shapiro , Gael Thomas

Towards High Performance Quantum Computing (HPQ): Parallelisation of the Hamiltonian Auto Decomposition Optimisation Framework (HADOF)

Practical applicability of quantum optimisation on near term devices is constrained by limited qubit counts and hardware noise, which restricts the scalability of quantum optimisation algorithms for combinatorial problems. The simulation of…

Quantum Physics · Physics 2026-05-01 Namasi G Sankar , Georgios Miliotis , Simon Caton

Pronto: Federated Task Scheduling

We present a federated, asynchronous, memory-limited algorithm for online task scheduling across large-scale networks of hundreds of workers. This is achieved through recent advancements in federated edge computing that unlocks the ability…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-29 Andreas Grammenos , Evangelia Kalyvianaki , Peter Pietzuch

SmartPQ: An Adaptive Concurrent Priority Queue for NUMA Architectures

Concurrent priority queues are widely used in important workloads, such as graph applications and discrete event simulations. However, designing scalable concurrent priority queues for NUMA architectures is challenging. Even though several…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-12 Christina Giannoula , Foteini Strati , Dimitrios Siakavaras , Georgios Goumas , Nectarios Koziris

Performance evaluation of explicit finite difference algorithms with varying amounts of computational and memory intensity

Future architectures designed to deliver exascale performance motivate the need for novel algorithmic changes in order to fully exploit their capabilities. In this paper, the performance of several numerical algorithms, characterised by…

Data Structures and Algorithms · Computer Science 2016-10-31 Satya P. Jammy , Christian T. Jacobs , Neil D. Sandham

Hybrid Parallel Bidirectional Sieve based on SMP Cluster

In this article, hybrid parallel bidirectional sieve method is implemented by SMP Cluster, the individual computational units joined together by the communication network, are usually shared-memory systems with one or more multicore…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-05-23 Gang Liao , Lian Luo , Lei Liu

Computational Algorithms for the Product Form Solution of Closed Queuing Networks with Finite Buffers and Skip-Over Policy

Closed queuing networks with finite capacity buffers and skip-over policies are fundamental models in the performance evaluation of computer and communication systems. This technical report presents the details of computational algorithms…

Performance · Computer Science 2024-09-13 Gianfranco Balbo , Andrea Marin , Diletta Olliaro , Matteo Sereno