Related papers: Write-and-f-array: implementation and an applicati…
Relaxing the sequential specification of a shared object is a way to obtain an implementation with better performance compared to implementing the original specification. We apply this approach to the Counter object, under the assumption…
We study the design of storage-efficient algorithms for emulating atomic shared memory over an asynchronous, distributed message-passing system. Our first algorithm is an atomic single-writer multi-reader algorithm based on a novel…
An initializable array is an array that supports the read and write operations for any element and the initialization of the entire array. This paper proposes a simple in-place algorithm to implement an initializable array of length $N$…
Ferroelectric field effect transistor (FeFET) memory has shown the potential to meet the requirements of the growing need for fast, dense, low-power, and non-volatile memories. In this paper, we propose a memory architecture named…
For a write request, today flash storage cannot distinguish the logical object it comes from. In such object-oblivious flash devices, concurrent writes from different objects are simply packed in their arrival order to flash memory blocks;…
In the context of asynchronous concurrent shared-memory systems, a snapshot algorithm allows failure-prone processes to concurrently and atomically write on the entries of a shared array MEM , and also atomically read the whole array.…
We present durable implementations for two well known universal primitives -- CAS (compare-and-swap), and its ABA-free counter-part LLSC (load-linked, store-conditional). All our implementations are: writable, meaning they support a Write()…
In this work, we propose the $\lambda$-scanner snapshot, a variation of the snapshot object, which supports any fixed amount of $0 < \lambda \leq n$ different $SCAN$ operations being active at any given time. Whenever $\lambda$ is equal to…
Many concurrent algorithms require processes to perform fetch-and-add operations on a single memory location, which can be a hot spot of contention. We present a novel algorithm called Aggregating Funnels that reduces this contention by…
We present an approach for efficiently taking snapshots of the state of a collection of CAS objects. Taking a snapshot allows later operations to read the value that each CAS object had at the time the snapshot was taken. Taking a snapshot…
In recent years, information retrieval algorithms have taken center stage for extracting important data in ever larger datasets. Advances in hardware technology have lead to the increasingly wide spread use of flash storage devices. Such…
In this paper we introduce Jiffy, the first lock-free, linearizable ordered key-value index that offers both (1) batch updates, which are put and remove operations that are executed atomically, and (2) consistent snapshots used by, e.g.,…
We present a novel linearizable wait-free queue implementation using single-word CAS instructions. Previous lock-free queue implementations from CAS all have amortized step complexity of $\Omega(p)$ per operation in worst-case executions,…
This paper presents the first implementation of a search tree data structure in an asynchronous shared-memory system that provides a wait-free algorithm for executing range queries on the tree, in addition to non-blocking algorithms for…
In caching system, it is desirable to design a coded caching scheme with the transmission load $R$ and subpacketization $F$ as small as possible, in order to improve efficiency of transmission in the peak traffic times and to decrease…
Flat combining (FC) is a synchronization paradigm in which a single thread, holding a global lock, collects requests by multiple threads for accessing a concurrent data structure and applies their combined requests to it. Although FC is…
We introduce Attention Free Transformer (AFT), an efficient variant of Transformers that eliminates the need for dot product self attention. In an AFT layer, the key and value are first combined with a set of learned position biases, the…
Distributed multi-writer atomic registers are at the heart of a large number of distributed algorithms. While enjoying the benefits of atomicity, researchers further explore fast implementations of atomic reigsters which are optimal in…
We present a multi-word atomic (1,N) register for multi-core machines exploiting Read-Modify-Write (RMW) instructions to coordinate the writer and the readers in a wait-free manner. Our proposal, called Anonymous Readers Counting (ARC),…
Contrary to common belief, a recent work by Ellen, Gelashvili, Shavit, and Zhu has shown that computability does not require multicore architectures to support "strong" synchronization instructions like compare-and-swap, as opposed to…