Related papers: Rethinking State-Machine Replication for Paralleli…
State-machine replication, a fundamental approach to fault tolerance, requires replicas to execute commands deterministically, which usually results in sequential execution of commands. Sequential execution limits performance and underuses…
State machine replication is standard approach to fault tolerance. One of the key assumptions of state machine replication is that replicas must execute operations deterministically and thus serially. To benefit from multi-core servers,…
Linearizability is a well-known correctness property for concurrent and distributed systems. In the past, it was also used to prove the design and implementation of replicated state-machines correct. State-machine replication (SMR) is a…
State Machine Replication (SMR) is a fundamental approach to designing service with fault tolerance. However, its requirement for the deterministic execution of transactions often results in single-threaded replicas, which cannot fully…
Modern Internet services commonly replicate critical data across several geographical locations using state-machine replication (SMR). Due to their reliance on a leader replica, classical SMR protocols offer limited scalability and…
State machine replication (SMR) is a replication technique that ensures fault tolerance by duplicating a service. Geo-replicated SMR is an enhanced version of SMR that distributes replicas in separate geographical locations, making the…
This paper considers the classical state machine replication (SMR) problem in a distributed system model inspired by cross-chain exchanges. We propose a novel SMR protocol adapted for this model. Each state machine transition takes $O(n)$…
With the slowdown of Moore's law, CPU-oriented packet processing in software will be significantly outpaced by emerging line speeds of network interface cards (NICs). Single-core packet-processing throughput has saturated. We consider the…
Despite the promising performance of state space models (SSMs) in long sequence modeling, limitations still exist. Advanced SSMs like S5 and S6 (Mamba) in addressing non-uniform sampling, their recursive structures impede efficient SSM…
We present a new state transfer method for geographic State Machine Replication (SMR) that dynamically allocates the state to be transferred among replicas according to changes in communication bandwidths. SMR is a method that improves…
State-space models (SSMs) have recently attention as an efficient alternative to computationally expensive attention-based models for sequence modeling. They rely on linear recurrences to integrate information over time, enabling fast…
Consensus, state-machine replication (SMR) and total order broadcast (TOB) protocols are notorious for being poorly scalable with the number of participating nodes. Despite the recent race to reduce overall message complexity of…
Geographic state machine replication (SMR) is a replication method in which replicas of a service are located on multiple continents to improve the fault tolerance of a general service. Nowadays, geographic SMR is easily realized using…
This paper proposes a parallelizable algorithm for linear-quadratic model predictive control (MPC) problems with state and input constraints. The algorithm itself is based on a parallel MPC scheme that has originally been designed for…
Stochastic simulations need multiple replications in order to build confidence intervals for their results. Even if we do not need a large amount of replications, it is a good practice to speed-up the whole simulation time using the…
This paper proposes a new state transfer method for geographic state machine replication (SMR) that dynamically allocates the state to be transferred among replicas according to changes in communication bandwidths. SMR improves fault…
Deterministic execution offers many benefits for debugging, fault tolerance, and security. Running parallel programs deterministically is usually difficult and costly, however - especially if we desire system-enforced determinism, ensuring…
We introduce a parallelizable simplification of Neural Turing Machine (NTM), referred to as P-NTM, which redesigns the core operations of the original architecture to enable efficient scan-based parallel execution. We evaluate the proposed…
Synchronous programs are used extensively in implementation of safety critical embedded software. Imperative synchronous programming languages model multiple Finite State Machines (FSMs) executing in lockstep at logical clock ticks. The…
In large-scale LLM pre-training systems with 100k+ GPUs, failures become the norm rather than the exception, and restart costs can dominate wall-clock training time. However, existing fault-tolerance mechanisms are largely unprepared for…