Related papers: Effectively Prefetching Remote Memory with Leap

Systems for Memory Disaggregation: Challenges & Opportunities

Memory disaggregation addresses memory imbalance in a cluster by decoupling CPU and memory allocations of applications while also increasing the effective memory capacity for (memory-intensive) applications beyond the local memory limit…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-07 Anil Yelam

3PO: Programmed Far-Memory Prefetching for Oblivious Applications

Using memory located on remote machines, or far memory, as a swap space is a promising approach to meet the increasing memory demands of modern datacenter applications. Operating systems have long relied on prefetchers to mask the increased…

Operating Systems · Computer Science 2022-07-19 Christopher Branner-Augmon , Narek Galstyan , Sam Kumar , Emmanuel Amaro , Amy Ousterhout , Aurojit Panda , Sylvia Ratnasamy , Scott Shenker

CXL over Ethernet: A Novel FPGA-based Memory Disaggregation Design in Data Centers

Memory resources in data centers generally suffer from low utilization and lack of dynamics. Memory disaggregation solves these problems by decoupling CPU and memory, which currently includes approaches based on RDMA or interconnection…

Hardware Architecture · Computer Science 2023-02-23 Chenjiu Wang , Ke He , Ruiqi Fan , Xiaonan Wang , Yang Kong , Wei Wang , Qinfen Hao

Design and Evaluation of a Rack-Scale Disaggregated Memory Architecture For Data Centers

Memory disaggregation is being considered as a strong alternative to traditional architecture to deal with the memory under-utilization in data centers. Disaggregated memory can adapt to dynamically changing memory requirements for the data…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-11 Amit Puri , John Jose , Tamarapalli Venkatesh

Handling of Memory Page Faults during Virtual-Address RDMA

Nowadays, avoiding system calls during cluster communication (e.g., in Data Centers and High Performance Computing) in modern high-speed interconnection networks has become a necessity, due to the high overhead of multiple data copies…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-27 Antonis Psistakis

INDIGO: Page Migration for Hardware Memory Disaggregation Across a Network

Hardware memory disaggregation (HMD) is an emerging technology that enables access to remote memory, thereby creating expansive memory pools and reducing memory underutilization in datacenters. However, a significant challenge arises when…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-25 Archit Patke , Christian Pinto , Saurabh Jha , Haoran Qiu , Zbigniew Kalbarczyk , Ravishankar Iyer

Hardware-assisted Trusted Memory Disaggregation for Secure Far Memory

Memory disaggregation provides efficient memory utilization across network-connected systems. It allows a node to use part of memory in remote nodes in the same cluster. Recent studies have improved RDMA-based memory disaggregation systems,…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-07 Taekyung Heo , Seunghyo Kang , Sanghyeon Lee , Soojin Hwang , Jaehyuk Huh

Memory Disaggregation: Advances and Open Challenges

Compute and memory are tightly coupled within each server in traditional datacenters. Large-scale datacenter operators have identified this coupling as a root cause behind fleet-wide resource underutilization and increasing Total Cost of…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-09 Hasan Al Maruf , Mosharaf Chowdhury

A Quantitative Approach for Adopting Disaggregated Memory in HPC Systems

Memory disaggregation has recently been adopted in data centers to improve resource utilization, motivated by cost and sustainability. Recent studies on large-scale HPC facilities have also highlighted memory underutilization. A promising…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-30 Jacob Wahlgren , Gabin Schieffer , Maya Gokhale , Ivy Peng

From RDMA to RDCA: Toward High-Speed Last Mile of Data Center Networks Using Remote Direct Cache Access

In this paper, we conduct systematic measurement studies to show that the high memory bandwidth consumption of modern distributed applications can lead to a significant drop of network throughput and a large increase of tail latency in…

Networking and Internet Architecture · Computer Science 2023-03-28 Qiang Li , Qiao Xiang , Derui Liu , Yuxin Wang , Haonan Qiu , Xiaoliang Wang , Jie Zhang , Ridi Wen , Haohao Song , Gexiao Tian , Chenyang Huang , Lulu Chen , Shaozong Liu , Yaohui Wu , Zhiwu Wu , Zicheng Luo , Yuchao Shao , Chao Han , Zhongjie Wu , Jianbo Dong , Zheng Cao , Jinbo Wu , Jiwu Shu , Jiesheng Wu

Taking the Leap: Efficient and Reliable Fine-Grained NUMA Migration in User-space

Modern multi-socket architectures offer a single virtual address space, but physically divide main-memory across multiple regions, where each region is attached to a CPU and its cores. While this simplifies the usage, developers must be…

Databases · Computer Science 2026-02-06 Felix Schuhknecht , Nick Rassau

RFP: A Remote Fetching Paradigm for RDMA-Accelerated Systems

Remote Direct Memory Access (RDMA) is an efficient way to improve the performance of traditional client-server systems. Currently, there are two main design paradigms for RDMA-accelerated systems. The first allows the clients to directly…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-12-25 Maomeng Su , Mingxing Zhang , Kang Chen , Yongwei Wu , Guoliang Li

Cache Coherence Over Disaggregated Memory

Disaggregating memory from compute offers the opportunity to better utilize stranded memory in cloud data centers. It is important to cache data in the compute nodes and maintain cache coherence across multiple compute nodes. However, the…

Databases · Computer Science 2026-01-14 Ruihong Wang , Jianguo Wang , Walid G. Aref

Optimising Virtual Resource Mapping in Multi-Level NUMA Disaggregated Systems

Disaggregated systems have a novel architecture motivated by the requirements of resource intensive applications such as social networking, search, and in-memory databases. The total amount of resources such as memory and CPU cores is very…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-03 Ewnetu Bayuh Lakew , Petter Svärd , Erik Elmroth , Johan Tordsson

A Tale of Two Paths: Toward a Hybrid Data Plane for Efficient Far-Memory Applications

With rapid advances in network hardware, far memory has gained a great deal of traction due to its ability to break the memory capacity wall. Existing far memory systems fall into one of two data paths: one that uses the kernel's paging…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-25 Lei Chen , Shi Liu , Chenxi Wang , Haoran Ma , Yifan Qiao , Zhe Wang , Chenggang Wu , Youyou Lu , Xiaobing Feng , Huimin Cui , Shan Lu , Harry Xu

Prefetching in Deep Memory Hierarchies with NVRAM as Main Memory

Emerging applications, such as big data analytics and machine learning, require increasingly large amounts of main memory, often exceeding the capacity of current commodity processors built on DRAM technology. To address this, recent…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-27 Manel Lurbe , Miguel Avargues , Salvador Petit , Maria E. Gomez , Rui Yang , Guanhao Wang , Julio Sahuquillo

Learning Semantics, Not Addresses: Runtime Neural Prefetching for Far Memory

Memory prefetching has long boosted CPU caches and is increasingly vital for far-memory systems, where large portions of memory are offloaded to cheaper, remote tiers. While effective prefetching requires accurate prediction of future…

Machine Learning · Computer Science 2025-10-07 Yutong Huang , Zhiyuan Guo , Yiying Zhang

DaeMon: Architectural Support for Efficient Data Movement in Disaggregated Systems

Resource disaggregation offers a cost effective solution to resource scaling, utilization, and failure-handling in data centers by physically separating hardware devices in a server. Servers are architected as pools of processor, memory,…

Hardware Architecture · Computer Science 2023-01-20 Christina Giannoula , Kailong Huang , Jonathan Tang , Nectarios Koziris , Georgios Goumas , Zeshan Chishti , Nandita Vijaykumar

DAK: Direct-Access-Enabled GPU Memory Offloading with Optimal Efficiency for LLM Inference

LLM inference is constrained by GPU memory capacity and bandwidth. Tiered memory architectures mitigate this by allowing the GPU to offload memory to the remote tier. However, existing memory offloading frameworks rely on prefetching data…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-30 Shouxu Lin , Zhiyuan Guo , Jiaxin Lin

Fault Tolerance for Remote Memory Access Programming Models

Remote Memory Access (RMA) is an emerging mechanism for programming high-performance computers and datacenters. However, little work exists on resilience schemes for RMA-based applications and systems. In this paper we analyze fault…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-20 Maciej Besta , Torsten Hoefler