Related papers: CXL Shared Memory Programming: Barely Distributed …
CXL (Compute Express Link) is an emerging open industry-standard interconnect between processing and memory devices that is expected to revolutionize the way systems are designed. It enables cache-coherent, shared memory pools in a…
In the landscape of High-Performance Computing (HPC), the quest for efficient and scalable memory solutions remains paramount. The advent of Compute Express Link (CXL) introduces a promising avenue with its potential to function as a…
Compute Express Link (CXL) is a rapidly emerging coherent interconnect standard that provides opportunities for memory pooling and sharing. Memory sharing is a well-established software feature that improves memory utilization by avoiding…
Interconnection is crucial for computing systems. However, the current interconnection performance between processors and devices, such as memory devices and accelerators, significantly lags behind their computing performance, severely…
Compute Express Link (CXL) 3.0 and beyond allows the compute nodes of a cluster to share data with hardware cache coherence and at the granularity of a cache line. This enables shared-memory semantics for distributed computing, but…
CXLMemSim is a fast, lightweight simulation framework that enables performance characterization of memory systems based on Compute Express Link (CXL) .mem technology. CXL.mem allows disaggregation and pooling of memory to mitigate memory…
This paper proposes TRAININGCXL that can efficiently process large-scale recommendation datasets in the pool of disaggregated memory while making training fault tolerant with low overhead. To this end, i) we integrate persistent memory…
Persistent Memory (PM) introduces new opportunities for designing crash-consistent applications without the traditional storage overheads. However, ensuring crash consistency in PM demands intricate knowledge of CPU, cache, and memory…
Recent Serverless workloads tend to be largescaled/CPU-memory intensive, such as DL, graph applications, that require dynamic memory-to-compute resources provisioning. Meanwhile, recent solutions seek to design page management strategies…
Compute Express Link (CXL) is a promising technology that addresses memory and storage challenges. Despite its advantages, CXL faces performance threats from external interference when co-existing with current memory and storage systems.…
Compute eXpress Link (CXL) is emerging as a promising memory interface technology. However, its performance characteristics remain largely unclear due to the limited availability of production hardware. Key questions include: What are the…
Memory resources in data centers generally suffer from low utilization and lack of dynamics. Memory disaggregation solves these problems by decoupling CPU and memory, which currently includes approaches based on RDMA or interconnection…
Large language models (LLMs) training or inference across multiple nodes introduces significant pressure on GPU memory and interconnect bandwidth. The Compute Express Link (CXL) shared memory pool offers a scalable solution by enabling…
The emergence of CXL (Compute Express Link) promises to transform the status of interconnects between host and devices and in turn impact the design of all software layers. With its low overhead, low latency, and memory coherency…
The trend toward specialized processing devices such as TPUs, DPUs, GPUs, and FPGAs has exposed the weaknesses of PCIe in interconnecting these devices and their hosts. Several attempts have been proposed to improve, augment, or downright…
Heterogeneous memory technologies are increasingly important instruments in addressing the memory wall in HPC systems. While most are deployed in single node setups, CXL.mem is a technology that implements memories that can be attached to…
The Compute Express Link (CXL) is an open industry-standard interconnect between processors and devices such as accelerators, memory buffers, smart network interfaces, persistent memory, and solid-state drives. CXL offers coherency and…
Engram conditional memory has emerged as a promising component for LLMs by decoupling static knowledge lookup from dynamic computation. Since Engram exhibits sparse access patterns and supports prefetching, its massive embedding tables are…
The ever-growing demands for memory with larger capacity and higher bandwidth have driven recent innovations on memory expansion and disaggregation technologies based on Compute eXpress Link (CXL). Especially, CXL-based memory expansion…
Datacenter applications often rely on remote procedure calls (RPCs) for fast, efficient, and secure communication. However, RPCs are slow, inefficient, and hard to use as they require expensive serialization and compression to communicate…