Related papers: Optimizing Distributed Protocols with Query Rewrit…
Consensus protocols are the foundation for building fault-tolerant, distributed systems, and services. They are also widely acknowledged as performance bottlenecks. Several recent systems have proposed accelerating these protocols using the…
In distributed learning, the goal is to perform a learning task over data distributed across multiple nodes with minimal (expensive) communication. Prior work (Daume III et al., 2012) proposes a general model that bounds the communication…
Traditional parallel schedulers running on cluster supercomputers support only static scheduling, where the number of processors allocated to an application remains fixed throughout the execution of the job. This results in…
Building consensus sequences based on distributed, fault-tolerant consensus, as used for replicated state machines, typically requires a separate distributed state for every new consensus instance. Allocating and maintaining this state…
State machine replication protocols, like MultiPaxos and Raft, are a critical component of many distributed systems and databases. However, these protocols offer relatively low throughput due to several bottlenecked components. Numerous…
Real-time scheduling and locking protocols are fundamental facilities to construct time-critical systems. For parallel real-time tasks, predictable locking protocols are required when concurrent sub-jobs mutually exclusive access to shared…
One of the traditional mechanisms used in distributed systems for maintaining the consistency of replicated data is voting. A problem involved in voting mechanisms is the size of the Quorums needed on each access to the data. In this paper,…
Internet-scale services rely on data partitioning and replication to provide scalable performance and high availability. Moreover, to reduce user-perceived response times and tolerate disasters (i.e., the failure of a whole datacenter),…
Classical state-machine replication protocols, such as Paxos, rely on a distinguished leader process to order commands. Unfortunately, this approach makes the leader a single point of failure and increases the latency for clients that are…
Agreement among a set of processes and in the presence of partial failures is one of the fundamental problems of distributed systems. In the most general case, many decisions must be agreed upon over the lifetime of a system with…
This paper proposes distributed algorithms to solve robust convex optimization (RCO) when the constraints are affected by nonlinear uncertainty. We adopt a scenario approach by randomly sampling the uncertainty set. To facilitate the…
In dual decomposition, the dual to an optimization problem with a specific structure is solved in distributed fashion using (sub)gradient and recently also fast gradient methods. The traditional dual decomposition suffers from two main…
Distributed locking mechanisms are fundamental to ensuring data consistency and integrity in distributed systems. This paper presents a comprehensive analysis of distributed locking algorithms, focusing on their performance characteristics…
In this paper we consider a novel partitioned framework for distributed optimization in peer-to-peer networks. In several important applications the agents of a network have to solve an optimization problem with two key features: (i) the…
There is no shortage of state machine replication protocols. From Generalized Paxos to EPaxos, a huge number of replication protocols have been proposed that achieve high throughput and low latency. However, these protocols all have two…
In this paper, we study unconstrained distributed optimization strongly convex problems, in which the exchange of information in the network is captured by a directed graph topology over digital channels that have limited capacity (and…
Distributed quantum computing (DQC) is being actively investigated as a means of scaling the number of qubits across multiple connected quantum devices. This includes quantum circuit compilation and execution management on multiple quantum…
Distributed computing frameworks such as MapReduce and Spark are often used to process large-scale data computing jobs. In wireless scenarios, exchanging data among distributed nodes would seriously suffer from the communication bottleneck…
System performance for networks composed of interconnected subsystems can be increased if the traditionally separated subsystems are jointly optimized. Recently, parallel and distributed optimization methods have emerged as a powerful tool…
State machine replication protocols, like MultiPaxos and Raft, are at the heart of nearly every strongly consistent distributed database. To tolerate machine failures, these protocols must replace failed machines with live machines, a…