English
Related papers

Related papers: GPU Graph Processing on CXL-Based Microsecond-Late…

200 papers

This work introduces a GPU storage expansion solution utilizing CXL, featuring a novel GPU system design with multiple CXL root ports for integrating diverse storage media (DRAMs and/or SSDs). We developed and siliconized a custom CXL…

Compute eXpress Link (CXL) is emerging as a promising memory interface technology. However, its performance characteristics remain largely unclear due to the limited availability of production hardware. Key questions include: What are the…

Performance · Computer Science 2025-10-14 Xi Wang , Jie Liu , Jianbo Wu , Shuangyan Yang , Jie Ren , Bhanu Shankar , Dong Li

Memory resources in data centers generally suffer from low utilization and lack of dynamics. Memory disaggregation solves these problems by decoupling CPU and memory, which currently includes approaches based on RDMA or interconnection…

Hardware Architecture · Computer Science 2023-02-23 Chenjiu Wang , Ke He , Ruiqi Fan , Xiaonan Wang , Yang Kong , Wei Wang , Qinfen Hao

Emerging Compute Express Link (CXL) enables cost-efficient memory expansion beyond the local DRAM of processors. While its CXL$.$mem protocol provides minimal latency overhead through an optimized protocol stack, frequent CXL memory…

Modern analytics and recommendation systems are increasingly based on graph data that capture the relations between entities being analyzed. Practical graphs come in huge sizes, offer massive parallelism, and are stored in sparse-matrix…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-01-18 Seung Won Min , Vikram Sharma Mailthody , Zaid Qureshi , Jinjun Xiong , Eiman Ebrahimi , Wen-mei Hwu

Large language models (LLMs) training or inference across multiple nodes introduces significant pressure on GPU memory and interconnect bandwidth. The Compute Express Link (CXL) shared memory pool offers a scalable solution by enabling…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-08 Dong Xu , Han Meng , Xinyu Chen , Dengcheng Zhu , Wei Tang , Fei Liu , Liguang Xie , Wu Xiang , Rui Shi , Yue Li , Henry Hu , Hui Zhang , Jianping Jiang , Dong Li

Compute Express Link (CXL) switch allows memory extension via PCIe physical layer to address increasing demand for larger memory capacities in data centers. However, CXL attached memory introduces 170ns to 400ns memory latency. This becomes…

Hardware Architecture · Computer Science 2025-03-14 Khan Shaikhul Hadi , Naveed Ul Mustafa , Mark Heinrich , Yan Solihin

In the landscape of High-Performance Computing (HPC), the quest for efficient and scalable memory solutions remains paramount. The advent of Compute Express Link (CXL) introduces a promising avenue with its potential to function as a…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-22 Yehonatan Fridman , Suprasad Mutalik Desai , Navneet Singh , Thomas Willhalm , Gal Oren

Small distributed systems are limited by their main memory to generate massively large graphs. Trivial extension to current graph generators to utilize external memory leads to large amount of random I/O hence do not scale with size. In…

Databases · Computer Science 2012-10-02 Sandeep Gupta

Interconnection is crucial for computing systems. However, the current interconnection performance between processors and devices, such as memory devices and accelerators, significantly lags behind their computing performance, severely…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-21 Chen Chen , Xinkui Zhao , Guanjie Cheng , Yuesheng Xu , Shuiguang Deng , Jianwei Yin

Compute Express Link (CXL) emerges as a solution for wide gap between computational speed and data communication rates among host and multiple devices. It fosters a unified and coherent memory space between host and CXL storage devices such…

The ever-growing demands for memory with larger capacity and higher bandwidth have driven recent innovations on memory expansion and disaggregation technologies based on Compute eXpress Link (CXL). Especially, CXL-based memory expansion…

Compute Express Link (CXL) is a promising technology that addresses memory and storage challenges. Despite its advantages, CXL faces performance threats from external interference when co-existing with current memory and storage systems.…

Hardware Architecture · Computer Science 2024-11-28 Shunyu Mao , Jiajun Luo , Yixin Li , Jiapeng Zhou , Weidong Zhang , Zheng Liu , Teng Ma , Shuwen Deng

The Compute Express Link (CXL) is an open industry-standard interconnect between processors and devices such as accelerators, memory buffers, smart network interfaces, persistent memory, and solid-state drives. CXL offers coherency and…

Hardware Architecture · Computer Science 2024-05-09 Debendra Das Sharma , Robert Blankenship , Daniel S. Berger

Integrating compute express link (CXL) with SSDs allows scalable access to large memory but has slower speeds than DRAMs. We present ExPAND, an expander-driven CXL prefetcher that offloads last-level cache (LLC) prefetching from host CPU to…

Recent trends in business and technology (e.g., machine learning, social network analysis) benefit from storing and processing growing amounts of graph-structured data in databases and data science platforms. FPGAs as accelerators for graph…

Databases · Computer Science 2021-02-09 Jonas Dann , Daniel Ritter , Holger Fröning

Compute Express Link (CXL) is a rapidly emerging coherent interconnect standard that provides opportunities for memory pooling and sharing. Memory sharing is a well-established software feature that improves memory utilization by avoiding…

Emerging Technologies · Computer Science 2024-04-05 Sunita Jain , Nagaradhesh Yeleswarapu , Hasan Al Maruf , Rita Gupta

Recent advances in reprogrammable hardware (e.g., FPGAs) and memory technology (e.g., DDR4, HBM) promise to solve performance problems inherent to graph processing like irregular memory access patterns on traditional hardware (e.g., CPU).…

Hardware Architecture · Computer Science 2021-04-19 Jonas Dann , Daniel Ritter , Holger Fröning

Graph analysis performs many random reads and writes, thus, these workloads are typically performed in memory. Traditionally, analyzing large graphs requires a cluster of machines so the aggregate memory exceeds the graph size. We…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-01-27 Da Zheng , Disa Mhembere , Randal Burns , Joshua Vogelstein , Carey E. Priebe , Alexander S. Szalay

CXL has been the emerging technology for expanding memory for both the host CPU and device accelerators with load/store interface. Extending memory coherency to the PCIe root complex makes the codesign more flexible in that you can access…

Hardware Architecture · Computer Science 2023-09-11 Yiwei Yang
‹ Prev 1 2 3 10 Next ›