Related papers: Exploring DRAM Cache Prefetching for Pooled Memory

Evaluating Emerging CXL-enabled Memory Pooling for HPC Systems

Current HPC systems provide memory resources that are statically configured and tightly coupled with compute nodes. However, workloads on HPC systems are evolving. Diverse workloads lead to a need for configurable memory resources to…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-03-23 Jacob Wahlgren , Maya Gokhale , Ivy B. Peng

Prefetcher-based DRAM Architecture

Advancement in Processor technology has made it easy to handle data-intensive workloads, but limiting main memory advances has created performance bottlenecks. In DRAM, there have been improvements in DRAM access latency as well as…

Hardware Architecture · Computer Science 2021-05-24 Saurabh Jaiswal , Shailendra Kumar Gupta , Soumya Soubhagya Dandapat

Prefetching in Deep Memory Hierarchies with NVRAM as Main Memory

Emerging applications, such as big data analytics and machine learning, require increasingly large amounts of main memory, often exceeding the capacity of current commodity processors built on DRAM technology. To address this, recent…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-27 Manel Lurbe , Miguel Avargues , Salvador Petit , Maria E. Gomez , Rui Yang , Guanhao Wang , Julio Sahuquillo

Cost-aware Joint Caching and Forwarding in Networks with Heterogeneous Cache Resources

Caching is crucial for enabling high-throughput networks for data intensive applications. Traditional caching technology relies on DRAM, as it can transfer data at a high rate. However, DRAM capacity is subject to contention by most system…

Networking and Internet Architecture · Computer Science 2023-10-12 Faruk Volkan Mutlu , Edmund Yeh

PUL: Pre-load in Software for Caches Wouldn't Always Play Along

Memory latencies and bandwidth are major factors, limiting system performance and scalability. Modern CPUs aim at hiding latencies by employing large caches, out-of-order execution, or complex hardware prefetchers. However, software-based…

Databases · Computer Science 2025-06-23 Arthur Bernhardt , Sajjad Tamimi , Florian Stock , Andreas Koch , Ilia Petrov

CXL Topology-Aware and Expander-Driven Prefetching: Unlocking SSD Performance

Integrating compute express link (CXL) with SSDs allows scalable access to large memory but has slower speeds than DRAMs. We present ExPAND, an expander-driven CXL prefetcher that offloads last-level cache (LLC) prefetching from host CPU to…

Hardware Architecture · Computer Science 2025-05-27 Dongsuk Oh , Miryeong Kwon , Jiseon Kim , Eunjee Na , Junseok Moon , Hyunkyu Choi , Seonghyeon Jang , Hanjin Choi , Hongjoo Jung , Sangwon Lee , Myoungsoo Jung

Performance Characterizations and Usage Guidelines of Samsung CXL Memory Module Hybrid Prototype

The growing prevalence of data-intensive workloads, such as artificial intelligence (AI), machine learning (ML), high-performance computing (HPC), in-memory databases, and real-time analytics, has exposed limitations in conventional memory…

Hardware Architecture · Computer Science 2025-03-31 Jianping Zeng , Shuyi Pei , Da Zhang , Yuchen Zhou , Amir Beygi , Xuebin Yao , Ramdas Kachare , Tong Zhang , Zongwang Li , Marie Nguyen , Rekha Pitchumani , Yang Soek Ki , Changhee Jung

Pond: CXL-Based Memory Pooling Systems for Cloud Platforms

Public cloud providers seek to meet stringent performance requirements and low hardware cost. A key driver of performance and cost is main memory. Memory pooling promises to improve DRAM utilization and thereby reduce costs. However,…

Operating Systems · Computer Science 2022-10-25 Huaicheng Li , Daniel S. Berger , Stanko Novakovic , Lisa Hsu , Dan Ernst , Pantea Zardoshti , Monish Shah , Samir Rajadnya , Scott Lee , Ishwar Agarwal , Mark D. Hill , Marcus Fontoura , Ricardo Bianchini

A Quantitative Approach for Adopting Disaggregated Memory in HPC Systems

Memory disaggregation has recently been adopted in data centers to improve resource utilization, motivated by cost and sustainability. Recent studies on large-scale HPC facilities have also highlighted memory underutilization. A promising…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-30 Jacob Wahlgren , Gabin Schieffer , Maya Gokhale , Ivy Peng

A Survey on Recent Hardware Data Prefetching Approaches with An Emphasis on Servers

Data prefetching, i.e., the act of predicting application's future memory accesses and fetching those that are not in the on-chip caches, is a well-known and widely-used approach to hide the long latency of memory accesses. The fruitfulness…

Hardware Architecture · Computer Science 2020-09-03 Mohammad Bakhshalipour , Mehran Shakerinava , Fatemeh Golshan , Ali Ansari , Pejman Lotfi-Karman , Hamid Sarbazi-Azad

Data Cache Prefetching with Perceptron Learning

Cache prefetcher greatly eliminates compulsory cache misses, by fetching data from slower memory to faster cache before it is actually required by processors. Sophisticated prefetchers predict next use cache line by repeating program's…

Hardware Architecture · Computer Science 2017-12-05 Haoyuan Wang , Zhiwei Luo

Pickle Prefetcher: Programmable and Scalable Last-Level Cache Prefetcher

Modern high-performance architectures employ large last-level caches (LLCs). While large LLCs can reduce average memory access latency for workloads with a high degree of locality, they can also increase latency for workloads with irregular…

Hardware Architecture · Computer Science 2025-11-26 Hoa Nguyen , Pongstorn Maidee , Jason Lowe-Power , Alireza Kaviani

CXLMemUring: A Hardware Software Co-design Paradigm for Asynchronous and Flexible Parallel CXL Memory Pool Access

CXL has been the emerging technology for expanding memory for both the host CPU and device accelerators with load/store interface. Extending memory coherency to the PCIe root complex makes the codesign more flexible in that you can access…

Hardware Architecture · Computer Science 2023-09-11 Yiwei Yang

Reducing Load Latency with Cache Level Prediction

High load latency that results from deep cache hierarchies and relatively slow main memory is an important limiter of single-thread performance. Data prefetch helps reduce this latency by fetching data up the hierarchy before it is…

Hardware Architecture · Computer Science 2021-03-30 Majid Jalili , Mattan Erez

CBP: Coordinated management of cache partitioning, bandwidth partitioning and prefetch throttling

Reducing the average memory access time is crucial for improving the performance of applications running on multi-core architectures. With workload consolidation this becomes increasingly challenging due to shared resource contention.…

Hardware Architecture · Computer Science 2021-02-24 Nadja Ramhöj Holtryd , Madhavan Manivannan , Per Stenström , Miquel Pericàs

CXLMemSim: A pure software simulated CXL.mem for performance characterization

CXLMemSim is a fast, lightweight simulation framework that enables performance characterization of memory systems based on Compute Express Link (CXL) .mem technology. CXL.mem allows disaggregation and pooling of memory to mitigate memory…

Performance · Computer Science 2025-06-18 Yiwei Yang , Brian Zhao , Yusheng Zheng , Pooneh Safayenikoo , Tanvir Ahmed Khan , Andi Quinn

DPC: A Distributed Page Cache over CXL

Modern distributed file systems rely on uncoordinated, per node page caches that replicate hot data locally across the cluster. While ensuring fast local access, this architecture underutilizes aggregate cluster DRAM capacity through…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-22 Shai Bergman , Zhe Yang , Julien Eudine , Giorgio Negro , Onur Mutlu , Arash Tavakkol , Ji Zhang

CXL over Ethernet: A Novel FPGA-based Memory Disaggregation Design in Data Centers

Memory resources in data centers generally suffer from low utilization and lack of dynamics. Memory disaggregation solves these problems by decoupling CPU and memory, which currently includes approaches based on RDMA or interconnection…

Hardware Architecture · Computer Science 2023-02-23 Chenjiu Wang , Ke He , Ruiqi Fan , Xiaonan Wang , Yang Kong , Wei Wang , Qinfen Hao

CXL Memory as Persistent Memory for Disaggregated HPC: A Practical Approach

In the landscape of High-Performance Computing (HPC), the quest for efficient and scalable memory solutions remains paramount. The advent of Compute Express Link (CXL) introduces a promising avenue with its potential to function as a…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-22 Yehonatan Fridman , Suprasad Mutalik Desai , Navneet Singh , Thomas Willhalm , Gal Oren

ICGMM: CXL-enabled Memory Expansion with Intelligent Caching Using Gaussian Mixture Model

Compute Express Link (CXL) emerges as a solution for wide gap between computational speed and data communication rates among host and multiple devices. It fosters a unified and coherent memory space between host and CXL storage devices such…

Hardware Architecture · Computer Science 2024-08-13 Hanqiu Chen , Yitu Wang , Luis Vitorio Cargnini , Mohammadreza Soltaniyeh , Dongyang Li , Gongjin Sun , Pradeep Subedi , Andrew Chang , Yiran Chen , Cong Hao