Related papers: Blazes: Coordination Analysis for Distributed Prog…
Coordination protocols help programmers of distributed systems reason about the effects of transactions on the state of the system, but they're not cheap. Coordination protocols may involve multiple rounds of communication, which can hurt…
Coordination services and protocols are critical components of distributed systems and are essential for providing consistency, fault tolerance, and scalability. However, due to the lack of standard benchmarking and evaluation tools for…
Distributed storage systems and databases are widely used by various types of applications. Transactional access to these storage systems is an important abstraction allowing application programmers to consider blocks of actions (i.e.,…
Software bugs require developers to exert significant effort to identify and resolve them, often consuming about one-third of their time. Bug localization, the process of pinpointing the exact source code files that need modification, is…
A key concern in modern distributed systems is to avoid the cost of coordination while maintaining consistent semantics. Until recently, there was no answer to the question of when coordination is actually required. In this paper we present…
Complex systems often exhibit unexpected faults that are difficult to handle. Such systems are desirable to be diagnosable, i.e. faults can be automatically detected as they occur (or shortly afterwards), enabling the system to handle the…
Distributed locking mechanisms are fundamental to ensuring data consistency and integrity in distributed systems. This paper presents a comprehensive analysis of distributed locking algorithms, focusing on their performance characteristics…
This article describes a very high-level language for clear description of distributed algorithms and optimizations necessary for generating efficient implementations. The language supports high-level control flows where complex…
Datastores today rely on distribution and replication to achieve improved performance and fault-tolerance. But correctness of many applications depends on strong consistency properties - something that can impose substantial overheads,…
Serverless applications can be particularly difficult to troubleshoot, as these applications are often composed of various managed and partly managed services. Faults are often unpredictable and can occur at multiple points, even in simple…
Distributed fog and edge applications communicate over unreliable networks and are subject to high communication delays. This makes using existing distributed coordination technologies from cloud applications infeasible, as they are built…
Building consistent distributed systems has largely depended on complex coordination strategies that are not only tricky to implement, but also take a toll on performance as they require nodes to wait for coordination messages. In this…
Distributed averaging is among the most relevant cooperative control problems, with applications in sensor and robotic networks, distributed signal processing, data fusion, and load balancing. Consensus and gossip algorithms have been…
The robustness of distributed optimization is an emerging field of study, motivated by various applications of distributed optimization including distributed machine learning, distributed sensing, and swarm robotics. With the rapid…
Considerable Progress has been made in the last few years in improving the performance of the distributed database systems. The development of Fragment allocation models in Distributed database is becoming difficult due to the complexity of…
Independent and identically distributed (i.i.d.) data is essential to many data analysis and modeling techniques. In the medical domain, collecting data from multiple sites or institutions is a common strategy that guarantees sufficient…
The robustness of distributed systems is usually phrased in terms of the number of failures of certain types that they can withstand. However, these failure models are too crude to describe the different kinds of trust and expectations of…
Blockchain and distributed ledger technologies rely on distributed consensus algorithms. In recent years many consensus algorithms and protocols have been proposed; most of them are for permissioned blockchain networks. However, the…
Distributed systems are comprised of many components that communicate together to form an application. Distributed tracing gives us visibility into these complex interactions, but it can be difficult to reason about the system's behavior,…
The log-based analysis and trouble-shooting has remained prevalent and commonly used approach for centralized and time-haring systems. However, for parallel and distributed systems where happen-before relations are not directly available…