Related papers: Parallel Deferred Update Replication
We propose Hybrid Transactional Replication (HTR), a novel replication scheme for highly dependable services. It combines two schemes: a transaction is executed either optimistically by only one service replica in the deferred update mode…
Read-optimized columnar databases use differential updates to handle writes by maintaining a separate write-optimized delta partition which is periodically merged with the read-optimized and compressed main partition. This merge process…
This article summarizes the idea of "refresh-access parallelism," which was published in HPCA 2014, and examines the work's significance and future potential. The overarching objective of our HPCA 2014 paper is to reduce the significant…
Maintaining a dynamic $k$-core decomposition is an important problem that identifies dense subgraphs in dynamically changing graphs. Recent work by Liu et al. [SPAA 2022] presents a parallel batch-dynamic algorithm for maintaining an…
State-machine replication, a fundamental approach to designing fault-tolerant services, requires commands to be executed in the same order by all replicas. Moreover, command execution must be deterministic: each replica must produce the…
In this paper we propose a novel approach to manage the throughput vs latency tradeoff that emerges when managing updates in geo-replicated systems. Our approach consists in allowing full concurrency when processing local updates and using…
Modern DRAM cells are periodically refreshed to prevent data loss due to leakage. Commodity DDR DRAM refreshes cells at the rank level. This degrades performance significantly because it prevents an entire rank from serving memory requests…
In this work we study parallelization of online learning, a core primitive in machine learning. In a parallel environment all known approaches for parallel online learning lead to delayed updates, where the model is updated using…
Modern DRAM cells are periodically refreshed to prevent data loss due to leakage. Commodity DDR DRAM refreshes cells at the rank level. This degrades performance significantly because it prevents an entire rank from serving memory requests…
Modern large-scale recommender systems employ multi-stage ranking funnel (Retrieval, Pre-ranking, Ranking) to balance engagement and computational constraints (latency, CPU). However, the initial retrieval stage, often relying on efficient…
Deterministic database systems have received increasing attention from the database research community in recent years. Despite their current limitations, recent proposals of distributed deterministic transaction processing systems…
With the slowdown of Moore's law, CPU-oriented packet processing in software will be significantly outpaced by emerging line speeds of network interface cards (NICs). Single-core packet-processing throughput has saturated. We consider the…
Iterative graph algorithms often compute intermediate values and update them as computation progresses. Updated output values are used as inputs for computations in current or subsequent iterations; hence the number of iterations required…
Parallel p-bit Ising machines are a promising platform for fast and energy-efficient combinatorial optimization, but their scalability depends on update synchronization, hardware delay, and architectural cost. In this work, we establish a…
The Deferred Correction (DeC) is an iterative procedure, characterized by increasing accuracy at each iteration, which can be used to design numerical methods for systems of ODEs. The main advantage of such framework is the automatic way of…
In the realm of shared memory systems, the challenge of reader-writer synchronization is closely coupled with the potential for readers to access outdated updates. Read-Copy-Update (RCU) is a synchronization primitive that allows for…
Consistency in data storage systems requires any read operation to return the most recent written version of the content. In replicated storage systems, consistency comes at the price of delay due to large-scale write and read operations.…
Compared to replication-based storage systems, erasure-coded storage incurs significantly higher overhead during data updates. To address this issue, various parity logging methods have been pro- posed. Nevertheless, due to the long update…
In this paper, we consider the convergence of a very general asynchronous-parallel algorithm called ARock, that takes many well-known asynchronous algorithms as special cases (gradient descent, proximal gradient, Douglas Rachford, ADMM,…
Parallelization techniques have become ubiquitous for accelerating inference and training of deep neural networks. Despite this, several operations are still performed in a sequential manner. For instance, the forward and backward passes…