Related papers: Resilient Cloud-based Replication with Low Latency
Traditionally, Byzantine fault tolerance (BFT) in geo-replicated systems is achieved by executing complex agreement protocols over large-distance communication links, and therefore typically incurs high response times. In this paper we…
Traditional resilient systems operate on fully-replicated fault-tolerant clusters, which limits their scalability and performance. One way to make the step towards resilient high-performance systems that can deal with huge workloads, is by…
Collaborative working is increasingly popular, but it presents challenges due to the need for high responsiveness and disconnected work support. To address these challenges the data is optimistically replicated at the edges of the network,…
We study a framework for modeling distributed network systems assisted by a reliable and powerful cloud service. Our framework aims at capturing hybrid systems based on a point to point message passing network of machines, with the…
Replicated services are inherently vulnerable to failures and security breaches. In a long-running system, it is, therefore, indispensable to maintain a reconfiguration mechanism that would replace faulty replicas with correct ones. An…
In P2P systems, large volumes of data are declustered naturally across a large number of peers. But it is very difficult to control the initial data distribution because every user has the freedom to share any data with other users. The…
Querying graph data with low latency is an important requirement in application domains such as social networks and knowledge graphs. Graph queries perform multiple hops between vertices. When data is partitioned and stored across multiple…
The accelerated digitalisation of society along with technological evolution have extended the geographical span of cyber-physical systems. Two main threats have made the reliable and real-time control of these systems challenging: (i)…
Grid Computing is a type of parallel and distributed systems that is designed to provide reliable access to data and computational resources in wide area networks. These resources are distributed in different geographical locations, however…
Internet-scale services rely on data partitioning and replication to provide scalable performance and high availability. Moreover, to reduce user-perceived response times and tolerate disasters (i.e., the failure of a whole datacenter),…
Minimizing end-to-end latency in geo-replicated systems usually makes it necessary to compromise on resilience, resource efficiency, or throughput performance, because existing approaches either tolerate only crashes, require additional…
The inherent connectivity and dependency of graph-structured data, combined with its unique topology-driven access patterns, pose fundamental challenges to conventional data replication and request routing strategies in geo-distributed…
This paper presents a resilient distributed algorithm for solving a system of linear algebraic equations over a multi-agent network in the presence of Byzantine agents capable of arbitrarily introducing untrustworthy information in…
The development of fault-tolerant distributed systems that can tolerate Byzantine behavior has traditionally been focused on consensus protocols, which support fully-replicated designs. For the development of more sophisticated…
Strong consistency replication helps keep application logic simple and provides significant benefits for correctness and manageability. Unfortunately, the adoption of strongly-consistent replication protocols has been curbed due to their…
Today's mainstream network timing models for distributed computing are synchrony, partial synchrony, and asynchrony. These models are coarse-grained and often make either too strong or too weak assumptions about the network. This paper…
In cloud computing systems, assigning a job to multiple servers and waiting for the earliest copy to finish is an effective method to combat the variability in response time of individual servers. Although adding redundant replicas always…
The large scale content distribution systems were improved broadly using the replication techniques. The demanded contents can be brought closer to the clients by multiplying the source of information geographically, which in turn reduce…
Service replication distributes an application over many processes for tolerating faults, attacks, and misbehavior among a subset of the processes. The established state-machine replication paradigm inherently requires the application to be…
Large-scale cyber-physical systems (CPS), such as railway control systems and smart grids, consist of geographically distributed subsystems that are connected via unreliable, asynchronous inter-region networks. Their scale and distribution…