Related papers: A Scalable, Portable, and Memory-Efficient Lock-Fr…
The concurrency literature presents a number of approaches for building non-blocking, FIFO, multiple-producer and multiple-consumer (MPMC) queues. However, only a fraction of them have high performance. In addition, many queue designs, such…
FIFO queues are a fundamental data structure used in a wide range of applications. Concurrent FIFO queues allow multiple execution threads to access the queue simultaneously. Maintaining strict FIFO semantics in concurrent queues leads to…
The queue is conceptually one of the simplest data structures-a basic FIFO container. However, ensuring correctness in the presence of concurrency makes existing lock-free implementations significantly more complex than their original form.…
In applications such as sharded data processing systems, sharded in-memory key-value stores, data flow programming and load sharing applications, multiple concurrent data producers are feeding requests into the same data consumer. This can…
Priority queues are data structures which store keys in an ordered fashion to allow efficient access to the minimal (maximal) key. Priority queues are essential for many applications, e.g., Dijkstra's single-source shortest path algorithm,…
FIFO queues with a single reader and writer can be insufficient for "hard real-time" systems where interrupt handlers require wait-free guarantees when writing to message queues. We present an algorithm which elegantly and practically…
In this paper, we study the question whether techniques employed, in a conventional system, by state-of-the-art concurrent algorithms to avoid contended hot spots are still efficient for recoverable computing in settings with Non-Volatile…
In high performance systems it is sometimes hard to build very large graphs that are efficient both with respect to memory and compute. This paper proposes a data structure called Markov-chain-priority-queue (MCPrioQ), which is a lock-free…
Multi-port memory controllers (MPMCs) have become increasingly important in many modern applications due to the tremendous growth in bandwidth requirement. Many approaches so far have focused on improving either the memory access latency or…
The conventional approach to the fronthaul design for cell-free massive MIMO system follows the compress-and-precode (CP) paradigm. Accordingly, encoded bits and precoding coefficients are shared by the distributed unit (DU) on the…
Flat combining (FC) is a synchronization paradigm in which a single thread, holding a global lock, collects requests by multiple threads for accessing a concurrent data structure and applies their combined requests to it. Although FC is…
Designing and implementing efficient parallel priority schedulers is an active research area. An intriguing proposed design is the Multi-Queue: given $n$ threads and $m\ge n$ distinct priority queues, task insertions are performed uniformly…
Priority queues are abstract data structures which store a set of key/value pairs and allow efficient access to the item with the minimal (maximal) key. Such queues are an important element in various areas of computer science such as…
For decades, advances in electronics were directly driven by the scaling of CMOS transistors according to Moore's law. However, both the CMOS scaling and the classical computer architecture are approaching fundamental and practical limits,…
Approximate Membership Query (AMQ) structures are essential for high-throughput systems in databases, networking, and bioinformatics. While Bloom filters offer speed, they lack support for deletions. Existing GPU-based dynamic alternatives,…
We explore the problem of efficiently implementing shared data structures in an asynchronous computing environment. We start with a traditional FIFO queue, showing that full replication is possible with a delay of only a single round-trip…
Packet scheduling is a fundamental networking task that recently received renewed attention in the context of programmable data planes. Programmable packet scheduling systems such as those based on Push-In First-Out (PIFO) abstraction…
Using efficient point-to-point communication channels is critical for implementing fine grained parallel program on modern shared cache multi-core architectures. This report discusses in detail several implementations of wait-free…
We present "Reciprocating Locks", a novel mutual exclusion locking algorithm, targeting cache-coherent shared memory (CC), that enjoys a number of desirable properties. The doorway arrival phase and the release operation both run in…
How to meet the demand for increasing number of users, higher data rates, and stringent quality-of-service (QoS) in the beyond fifth-generation (B5G) networks? Cell-free massive multiple-input multiple-output (MIMO) is considered as a…