Related papers: GPU Graph Processing on CXL-Based Microsecond-Late…

CXL-GPU: Pushing GPU Memory Boundaries with the Integration of CXL Technologies

This work introduces a GPU storage expansion solution utilizing CXL, featuring a novel GPU system design with multiple CXL root ports for integrating diverse storage media (DRAMs and/or SSDs). We developed and siliconized a custom CXL…

Hardware Architecture · Computer Science 2025-06-19 Donghyun Gouk , Seungkwan Kang , Seungjun Lee , Jiseon Kim , Kyungkuk Nam , Eojin Ryu , Sangwon Lee , Dongpyung Kim , Junhyeok Jang , Hanyeoreum Bae , Myoungsoo Jung

Exploring and Evaluating Real-world CXL: Use Cases and System Adoption

Compute eXpress Link (CXL) is emerging as a promising memory interface technology. However, its performance characteristics remain largely unclear due to the limited availability of production hardware. Key questions include: What are the…

Performance · Computer Science 2025-10-14 Xi Wang , Jie Liu , Jianbo Wu , Shuangyan Yang , Jie Ren , Bhanu Shankar , Dong Li

CXL over Ethernet: A Novel FPGA-based Memory Disaggregation Design in Data Centers

Memory resources in data centers generally suffer from low utilization and lack of dynamics. Memory disaggregation solves these problems by decoupling CPU and memory, which currently includes approaches based on RDMA or interconnection…

Hardware Architecture · Computer Science 2023-02-23 Chenjiu Wang , Ke He , Ruiqi Fan , Xiaonan Wang , Yang Kong , Wei Wang , Qinfen Hao

Low-overhead General-purpose Near-Data Processing in CXL Memory Expanders

Emerging Compute Express Link (CXL) enables cost-efficient memory expansion beyond the local DRAM of processors. While its CXL$.$mem protocol provides minimal latency overhead through an optimized protocol stack, frequent CXL memory…

Hardware Architecture · Computer Science 2024-10-07 Hyungkyu Ham , Jeongmin Hong , Geonwoo Park , Yunseon Shin , Okkyun Woo , Wonhyuk Yang , Jinhoon Bae , Eunhyeok Park , Hyojin Sung , Euicheol Lim , Gwangsun Kim

EMOGI: Efficient Memory-access for Out-of-memory Graph-traversal In GPUs

Modern analytics and recommendation systems are increasingly based on graph data that capture the relations between entities being analyzed. Practical graphs come in huge sizes, offer massive parallelism, and are stored in sparse-matrix…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-01-18 Seung Won Min , Vikram Sharma Mailthody , Zaid Qureshi , Jinjun Xiong , Eiman Ebrahimi , Wen-mei Hwu

CCCL: Node-Spanning GPU Collectives with CXL Memory Pooling

Large language models (LLMs) training or inference across multiple nodes introduces significant pressure on GPU memory and interconnect bandwidth. The Compute Express Link (CXL) shared memory pool offers a scalable solution by enabling…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-08 Dong Xu , Han Meng , Xinyu Chen , Dengcheng Zhu , Wei Tang , Fei Liu , Liguang Xie , Wu Xiang , Rui Shi , Yue Li , Henry Hu , Hui Zhang , Jianping Jiang , Dong Li

The Case for Persistent CXL switches

Compute Express Link (CXL) switch allows memory extension via PCIe physical layer to address increasing demand for larger memory capacities in data centers. However, CXL attached memory introduces 170ns to 400ns memory latency. This becomes…

Hardware Architecture · Computer Science 2025-03-14 Khan Shaikhul Hadi , Naveed Ul Mustafa , Mark Heinrich , Yan Solihin

CXL Memory as Persistent Memory for Disaggregated HPC: A Practical Approach

In the landscape of High-Performance Computing (HPC), the quest for efficient and scalable memory solutions remains paramount. The advent of Compute Express Link (CXL) introduces a promising avenue with its potential to function as a…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-22 Yehonatan Fridman , Suprasad Mutalik Desai , Navneet Singh , Thomas Willhalm , Gal Oren

External Memory based Distributed Generation of Massive Scale Social Networks on Small Clusters

Small distributed systems are limited by their main memory to generate massively large graphs. Trivial extension to current graph generators to utilize external memory leads to large amount of random I/O hence do not scale with size. In…

Databases · Computer Science 2012-10-02 Sandeep Gupta

Next-Gen Computing Systems with Compute Express Link: a Comprehensive Survey

Interconnection is crucial for computing systems. However, the current interconnection performance between processors and devices, such as memory devices and accelerators, significantly lags behind their computing performance, severely…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-21 Chen Chen , Xinkui Zhao , Guanjie Cheng , Yuesheng Xu , Shuiguang Deng , Jianwei Yin

ICGMM: CXL-enabled Memory Expansion with Intelligent Caching Using Gaussian Mixture Model

Compute Express Link (CXL) emerges as a solution for wide gap between computational speed and data communication rates among host and multiple devices. It fosters a unified and coherent memory space between host and CXL storage devices such…

Hardware Architecture · Computer Science 2024-08-13 Hanqiu Chen , Yitu Wang , Luis Vitorio Cargnini , Mohammadreza Soltaniyeh , Dongyang Li , Gongjin Sun , Pradeep Subedi , Andrew Chang , Yiran Chen , Cong Hao

Demystifying CXL Memory with Genuine CXL-Ready Systems and Devices

The ever-growing demands for memory with larger capacity and higher bandwidth have driven recent innovations on memory expansion and disaggregation technologies based on Compute eXpress Link (CXL). Especially, CXL-based memory expansion…

Performance · Computer Science 2023-10-06 Yan Sun , Yifan Yuan , Zeduo Yu , Reese Kuper , Chihun Song , Jinghan Huang , Houxiang Ji , Siddharth Agarwal , Jiaqi Lou , Ipoom Jeong , Ren Wang , Jung Ho Ahn , Tianyin Xu , Nam Sung Kim

CXL-Interference: Analysis and Characterization in Modern Computer Systems

Compute Express Link (CXL) is a promising technology that addresses memory and storage challenges. Despite its advantages, CXL faces performance threats from external interference when co-existing with current memory and storage systems.…

Hardware Architecture · Computer Science 2024-11-28 Shunyu Mao , Jiajun Luo , Yixin Li , Jiapeng Zhou , Weidong Zhang , Zheng Liu , Teng Ma , Shuwen Deng

An Introduction to the Compute Express Link (CXL) Interconnect

The Compute Express Link (CXL) is an open industry-standard interconnect between processors and devices such as accelerators, memory buffers, smart network interfaces, persistent memory, and solid-state drives. CXL offers coherency and…

Hardware Architecture · Computer Science 2024-05-09 Debendra Das Sharma , Robert Blankenship , Daniel S. Berger

CXL Topology-Aware and Expander-Driven Prefetching: Unlocking SSD Performance

Integrating compute express link (CXL) with SSDs allows scalable access to large memory but has slower speeds than DRAMs. We present ExPAND, an expander-driven CXL prefetcher that offloads last-level cache (LLC) prefetching from host CPU to…

Hardware Architecture · Computer Science 2025-05-27 Dongsuk Oh , Miryeong Kwon , Jiseon Kim , Eunjee Na , Junseok Moon , Hyunkyu Choi , Seonghyeon Jang , Hanjin Choi , Hongjoo Jung , Sangwon Lee , Myoungsoo Jung

Exploring Memory Access Patterns for Graph Processing Accelerators

Recent trends in business and technology (e.g., machine learning, social network analysis) benefit from storing and processing growing amounts of graph-structured data in databases and data science platforms. FPGAs as accelerators for graph…

Databases · Computer Science 2021-02-09 Jonas Dann , Daniel Ritter , Holger Fröning

Memory Sharing with CXL: Hardware and Software Design Approaches

Compute Express Link (CXL) is a rapidly emerging coherent interconnect standard that provides opportunities for memory pooling and sharing. Memory sharing is a well-established software feature that improves memory utilization by avoiding…

Emerging Technologies · Computer Science 2024-04-05 Sunita Jain , Nagaradhesh Yeleswarapu , Hasan Al Maruf , Rita Gupta

Demystifying Memory Access Patterns of FPGA-Based Graph Processing Accelerators

Recent advances in reprogrammable hardware (e.g., FPGAs) and memory technology (e.g., DDR4, HBM) promise to solve performance problems inherent to graph processing like irregular memory access patterns on traditional hardware (e.g., CPU).…

Hardware Architecture · Computer Science 2021-04-19 Jonas Dann , Daniel Ritter , Holger Fröning

FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs

Graph analysis performs many random reads and writes, thus, these workloads are typically performed in memory. Traditionally, analyzing large graphs requires a cluster of machines so the aggregate memory exceeds the graph size. We…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-01-27 Da Zheng , Disa Mhembere , Randal Burns , Joshua Vogelstein , Carey E. Priebe , Alexander S. Szalay

CXLMemUring: A Hardware Software Co-design Paradigm for Asynchronous and Flexible Parallel CXL Memory Pool Access

CXL has been the emerging technology for expanding memory for both the host CPU and device accelerators with load/store interface. Extending memory coherency to the PCIe root complex makes the codesign more flexible in that you can access…

Hardware Architecture · Computer Science 2023-09-11 Yiwei Yang