Related papers: High-Performance Distributed RMA Locks

BRAVO -- Biased Locking for Reader-Writer Locks

Designers of modern reader-writer locks confront a difficult trade-off related to reader scalability. Locks that have a compact memory representation for active readers will typically suffer under high intensity read-dominated workloads…

Operating Systems · Computer Science 2019-07-11 David Dice , Alex Kogan

Coded Distributed Computing for Hierarchical Multi-task Learning

In this paper, we consider a hierarchical distributed multi-task learning (MTL) system where distributed users wish to jointly learn different models orchestrated by a central server with the help of a layer of multiple relays. Since the…

Information Theory · Computer Science 2022-12-19 Haoyang Hu , Songze Li , Minquan Cheng , Youlong Wu

TAPA-CS: Enabling Scalable Accelerator Design on Distributed HBM-FPGAs

Despite the increasing adoption of Field-Programmable Gate Arrays (FPGAs) in compute clouds, there remains a significant gap in programming tools and abstractions which can leverage network-connected, cloud-scale, multi-die FPGAs to…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-05 Neha Prakriya , Yuze Chi , Suhail Basalama , Linghao Song , Jason Cong

Asynchronous Distributed-Memory Triangle Counting and LCC with RMA Caching

Triangle count and local clustering coefficient are two core metrics for graph analysis. They find broad application in analyses such as community detection and link recommendation. Current state-of-the-art solutions suffer from…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-02 András Strausz , Flavio Vella , Salvatore Di Girolamo , Maciej Besta , Torsten Hoefler

Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems

The relaxed semantics and rich functionality of one-sided communication primitives of MPI-3 makes MPI an attractive candidate for the implementation of PGAS models. However, the performance of such implementation suffers from the fact, that…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-03-08 Huan Zhou , Kamran Idrees , José Gracia

Distributed Locking: Performance Analysis and Optimization Strategies

Distributed locking mechanisms are fundamental to ensuring data consistency and integrity in distributed systems. This paper presents a comprehensive analysis of distributed locking algorithms, focusing on their performance characteristics…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-07 Andre Rodriguez , William Osborn

Designing Scalable Rate Limiting Systems: Algorithms, Architecture, and Distributed Solutions

Designing a rate limiter that is simultaneously accurate, available, and scalable presents a fundamental challenge in distributed systems, primarily due to the trade-offs between algorithmic precision, availability, consistency, and…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-13 Bo Guan

Analyzing Persistent Alltoallv RMA Implementations for High-Performance MPI Communication

Collective communication operations such as MPI_Alltoallv are central to many HPC applications, particularly those with irregular message sizes. We design, implement, and evaluate persistent MPI RMA variants of Alltoallv based on fence and…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-08 Evelyn Namugwanya

Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory

Memory disaggregation architecture physically separates CPU and memory into independent components, which are connected via high-speed RDMA networks, greatly improving resource utilization of databases. However, such an architecture poses…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-12-21 Qing Wang , Youyou Lu , Jiwu Shu

A fast MPI-based Distributed Hash-Table as Surrogate Model demonstrated in a coupled reactive transport HPC simulation

Surrogate models can play a pivotal role in enhancing performance in contemporary High-Performance Computing applications. Cache-based surrogates use already calculated simulation results to interpolate or extrapolate further simulation…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-22 Max Lübke , Marco De Lucia , Steffen Christgau , Stefan Petri , Bettina Schnor

Using RDMA for Lock Management

In this work, we aim to evaluate different Distributed Lock Management service designs with Remote Direct Memory Access (RDMA). In specific, we implement and evaluate the centralized and the RDMA-enabled lock manager designs for fast…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-07-21 Yeounoh Chung , Erfan Zamanian

Distributed Multi-writer Multi-reader Atomic Register with Optimistically Fast Read and Write

A distributed multi-writer multi-reader (MWMR) atomic register is an important primitive that enables a wide range of distributed algorithms. Hence, improving its performance can have large-scale consequences. Since the seminal work of ABD…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-20 Lewis Tseng , Neo Zhou , Cole Dumas , Tigran Bantikyan , Roberto Palmieri

An efficient distributed scheduling algorithm for relay-assisted mmWave backhaul networks

In this paper, a novel distributed scheduling algorithm is proposed, which aims to efficiently schedule both the uplink and downlink backhaul traffic in the relay-assisted mmWave backhaul network with a tree topology. The handshaking of…

Networking and Internet Architecture · Computer Science 2022-02-17 Qiang Hu , Yuchen Liu , Yan Yan , Miao Liu , Jun Zheng , Douglas M. Blough

A Non-blocking Buddy System for Scalable Memory Allocation on Multi-core Machines

Common implementations of core memory allocation components, like the Linux buddy system, handle concurrent allocation/release requests by synchronizing threads via spin-locks. This approach is clearly not prone to scale with large thread…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-05-22 Romolo Marotta , Mauro Ianni , Alessandro Pellegrini , Andrea Scarselli , Francesco Quaglia

Distributed Allocation and Resource Scheduling Algorithms Resilient to Link Failure

Distributed resource allocation (DRA) is fundamental to modern networked systems, spanning applications from economic dispatch in smart grids to CPU scheduling in data centers. Conventional DRA approaches require reliable communication, yet…

Systems and Control · Electrical Eng. & Systems 2025-10-22 Mohammadreza Doostmohammadian , Sergio Pequito

Active Access: A Mechanism for High-Performance Distributed Data-Centric Computations

Remote memory access (RMA) is an emerging high-performance programming model that uses RDMA hardware directly. Yet, accessing remote memories cannot invoke activities at the target which complicates implementation and limits performance of…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-20 Maciej Besta , Torsten Hoefler

Rank-Aware Resource Scheduling for Tightly-Coupled MPI Workloads on Kubernetes

Fully provisioned Message Passing Interface (MPI) parallelism achieves near-optimal wall-clock time for Computational Fluid Dynamics (CFD) solvers. This work addresses a complementary question for shared, cloud-managed clusters: can…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-25 Tianfang Xie

SARA: A Stall-Aware Memory Allocation Strategy for Mixed-Criticality Systems

The memory capacity in edge devices is often limited due to constraints on cost, size, and power. Consequently, memory competition leads to inevitable page swapping in memory-constrained mixed-criticality edge devices, causing slow storage…

Operating Systems · Computer Science 2025-11-26 Meng-Chia Lee , Wen Sheng Lim , Yuan-Hao Chang , Tei-Wei Kuo

Efficient Planning of Multi-Robot Collective Transport using Graph Reinforcement Learning with Higher Order Topological Abstraction

Efficient multi-robot task allocation (MRTA) is fundamental to various time-sensitive applications such as disaster response, warehouse operations, and construction. This paper tackles a particular class of these problems that we call…

Multiagent Systems · Computer Science 2023-08-21 Steve Paul , Wenyuan Li , Brian Smyth , Yuzhou Chen , Yulia Gel , Souma Chowdhury

Decentralized Task Scheduling in Distributed Systems: A Deep Reinforcement Learning Approach

Efficient task scheduling in large-scale distributed systems presents significant challenges due to dynamic workloads, heterogeneous resources, and competing quality-of-service requirements. Traditional centralized approaches face…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-27 Daniel Benniah John