Related papers: Cornus: Atomic Commit for a Cloud DBMS with Storag…
Context: Concurrent objects with asynchronous messaging are an increasingly popular way to structure highly available, high performance, large-scale software systems. To ensure data-consistency and support synchronization between objects…
Atomic commit protocols are used where data integrity is more important than data availability. Two-Phase commit (2PC) is a standard commit protocol for commercial database management systems. To reduce certain drawbacks in 2PC protocol…
Modern distributed databases face challenges in achieving transactional consistency across distributed partitions. Traditional two-phase commit (2PC) protocols incur high coordination overhead and latency, and require complex recovery for…
Two-phase-commit (2PC) has been widely adopted for distributed transaction processing, but it also jeopardizes throughput by introducing two rounds of network communications and two durable log writes to a transaction's critical path.…
In a geo-distributed database, data shards and their respective replicas are deployed in distinct datacenters across multiple regions, enabling regional-level disaster recovery and the ability to serve global users locally. However,…
Consensus protocols are the foundation for building fault-tolerant, distributed systems, and services. They are also widely acknowledged as performance bottlenecks. Several recent systems have proposed accelerating these protocols using the…
In distributed transaction processing, atomic commit protocol (ACP) is used to ensure database consistency. With the use of commodity compute nodes and networks, failures such as system crashes and network partitioning are common. It is…
Disaggregated memory (DM) separates compute and memory resources, allowing flexible scaling to achieve high resource utilization. To ensure atomic and consistent data access on DM, distributed transaction systems have been adapted, where…
Many distributed systems require coordination between the components involved. With the steady growth of such systems, the probability of failures increases, which necessitates scalable fault-tolerant agreement protocols. The most common…
The distributed transaction commit problem requires reaching agreement on whether a transaction is committed or aborted. The classic Two-Phase Commit protocol blocks if the coordinator fails. Fault-tolerant consensus algorithms also reach…
Efficient LLM inference is critical for real-world applications, especially within heterogeneous GPU clusters commonly found in organizations and on-premise datacenters as GPU architecture rapidly evolves. Current disaggregated prefill…
This extended report presents DDS, a novel disaggregated storage architecture enabled by emerging networking hardware, namely DPUs (Data Processing Units). DPUs can optimize the latency and CPU consumption of disaggregated storage servers.…
Existing disaggregated databases separate execution and storage layers, enabling independent and elastic scaling of resources. In most cases, this design makes transaction concurrency control (CC) a critical bottleneck, which demands…
Distributed protocols such as 2PC and Paxos lie at the core of many systems in the cloud, but standard implementations do not scale. New scalable distributed protocols are developed through careful analysis and rewrites, but this process is…
This paper presents a novel approach to handle the computational complexity in security-constrained unit commitment (SCUC) with corrective network reconfiguration (CNR) to harness the flexibility in transmission networks. This is achieved…
Highly-available datastores are widely deployed for online applications. However, many online applications are not contented with the simple data access interface currently provided by highly-available datastores. Distributed transaction…
Although significant recent progress has been made in improving the multi-core scalability of high throughput transactional database systems, modern systems still fail to achieve scalable throughput for workloads involving frequent access…
A conventional data center that consists of monolithic-servers is confronted with limitations including lack of operational flexibility, low resource utilization, low maintainability, etc. Resource disaggregation is a promising solution to…
Two-phase locking (2PL) is a fundamental and widely used concurrency control protocol. It regulates concurrent access to database data by following a specific sequence of acquiring and releasing locks during transaction execution, thereby…
Atomic Commit Problem (ACP) is a single-shot agreement problem similar to consensus, meant to model the properties of transaction commit protocols in fault-prone distributed systems. We argue that ACP is too restrictive to capture the…