Related papers: Self-Stabilizing Paxos
Building consensus sequences based on distributed, fault-tolerant consensus, as used for replicated state machines, typically requires a separate distributed state for every new consensus instance. Allocating and maintaining this state…
Virtual synchrony is an important abstraction that is proven to be extremely useful when implemented over asynchronous, typically large, message-passing distributed systems. Fault tolerant design is a key criterion for the success of such…
Current reconfiguration techniques are based on starting the system in a consistent configuration, in which all participating entities are in their initial state. Starting from that state, the system must preserve consistency as long as a…
Agreement among a set of processes and in the presence of partial failures is one of the fundamental problems of distributed systems. In the most general case, many decisions must be agreed upon over the lifetime of a system with…
State machine replication protocols, like MultiPaxos and Raft, are at the heart of nearly every strongly consistent distributed database. To tolerate machine failures, these protocols must replace failed machines with live machines, a…
The behavior and architecture of large scale discrete state systems found in computer software and hardware can be specified and analyzed using a particular class of primitive recursive functions. This paper begins with an illustration of…
The ability to perform repeated Byzantine agreement lies at the heart of important applications such as blockchain price oracles or replicated state machines. Any such protocol requires the following properties: (1) \textit{Byzantine…
Self-stabilization is a versatile methodology in the design of fault-tolerant distributed algorithms for transient faults. A self-stabilizing system automatically recovers from any kind and any finite number of transient faults. This…
Lamport's Paxos algorithm is a classic consensus protocol for state machine replication in environments that admit crash failures. Many versions of Paxos exploit the protocol's intrinsic properties for the sake of gaining better run-time…
Reconfigurable state machine replication is an important enabler of elasticity for replicated cloud services, which must be able to dynamically adjust their size as a function of changing load and resource availability. We introduce a new…
Distributed consensus, the ability to reach agreement in the face of failures, is a fundamental primitive for constructing reliable distributed systems. The Paxos algorithm is synonymous with consensus and widely utilized in production.…
Self-stabilization is a versatile fault-tolerance approach that characterizes the ability of a system to eventually resume a correct behavior after any finite number of transient faults. In this paper, we propose a self-stabilizing reset…
Byzantine agreement algorithms typically assume implicit initial state consistency and synchronization among the correct nodes and then operate in coordinated rounds of information exchange to reach agreement based on the input values. The…
Numerous distributed applications, such as cloud computing and distributed ledgers, necessitate the system to invoke asynchronous consensus objects an unbounded number of times, where the completion of one consensus instance is followed by…
A self-stabilizing protocol has the capacity to recover a legitimate behavior whatever is its initial state. The majority of works in self-stabilization assume a shared memory model or a communication using reliable and FIFO channels. In…
In this paper we provide a first-ever epistemic formulation of stabilizing agreement, defined as the non-terminating variant of the well established consensus problem. In stabilizing agreements, agents are given (possibly different) initial…
Self-stabilization is a general paradigm to provide forward recovery capabilities to distributed systems and networks. Intuitively, a protocol is self-stabilizing if it is able to recover without external intervention from any catastrophic…
Classical state-machine replication protocols, such as Paxos, rely on a distinguished leader process to order commands. Unfortunately, this approach makes the leader a single point of failure and increases the latency for clients that are…
Distributed consensus, the ability to reach agreement in the face of failures and asynchrony, is a fundamental primitive for constructing reliable distributed systems from unreliable components. The Paxos algorithm is synonymous with…
In the stabilizing consensus problem, each agent of a networked system has an input value and is repeatedly writing an output value; it is required that eventually all the output values stabilize to the same value which, moreover, must be…