Related papers: Enabling Efficient Transaction Processing on CXL-B…
Interconnection is crucial for computing systems. However, the current interconnection performance between processors and devices, such as memory devices and accelerators, significantly lags behind their computing performance, severely…
The Compute Express Link (CXL) is an open industry-standard interconnect between processors and devices such as accelerators, memory buffers, smart network interfaces, persistent memory, and solid-state drives. CXL offers coherency and…
CXL (Compute Express Link) is an emerging open industry-standard interconnect between processing and memory devices that is expected to revolutionize the way systems are designed. It enables cache-coherent, shared memory pools in a…
Compute Express Link (CXL) is a rapidly emerging coherent interconnect standard that provides opportunities for memory pooling and sharing. Memory sharing is a well-established software feature that improves memory utilization by avoiding…
Compute Express Link (CXL) is a promising technology that addresses memory and storage challenges. Despite its advantages, CXL faces performance threats from external interference when co-existing with current memory and storage systems.…
The trend toward specialized processing devices such as TPUs, DPUs, GPUs, and FPGAs has exposed the weaknesses of PCIe in interconnecting these devices and their hosts. Several attempts have been proposed to improve, augment, or downright…
Modern AI workloads such as large language models (LLMs) and retrieval-augmented generation (RAG) impose severe demands on memory, communication bandwidth, and resource flexibility. Traditional GPU-centric architectures struggle to scale…
Compute Express Link (CXL) 3.0 and beyond allows the compute nodes of a cluster to share data with hardware cache coherence and at the granularity of a cache line. This enables shared-memory semantics for distributed computing, but…
Compute Express Link (CXL) switch allows memory extension via PCIe physical layer to address increasing demand for larger memory capacities in data centers. However, CXL attached memory introduces 170ns to 400ns memory latency. This becomes…
Modern workloads are demanding increasingly larger memory capacity. Compute Express Link (CXL)-based memory tiering has emerged as a promising solution for addressing this problem by utilizing traditional DRAM alongside slow-tier CXL memory…
The memory system is a major performance determinant for server processors. Ever-growing core counts and datasets demand higher bandwidth and capacity as well as lower latency from the memory system. To keep up with growing demands,…
Memory dominates datacenter system cost and power. Memory expansion via Compute Express Link (CXL) is an effective way to provide additional memory at lower cost and power, but its effective use requires software-level tiering for…
Conventional heterogeneous computing systems built on PCIe interconnects suffer from inefficient fine-grained host-device interactions and complex programming models. In recent years, many proprietary and open cache-coherent interconnect…
Large language models (LLMs) training or inference across multiple nodes introduces significant pressure on GPU memory and interconnect bandwidth. The Compute Express Link (CXL) shared memory pool offers a scalable solution by enabling…
Recent Serverless workloads tend to be largescaled/CPU-memory intensive, such as DL, graph applications, that require dynamic memory-to-compute resources provisioning. Meanwhile, recent solutions seek to design page management strategies…
While Compute Express Link (CXL) enables support for cache-coherent shared memory among multiple nodes, it also introduces new types of failures--processes can fail before data does, or data might fail before a process does. The lack of a…
CXL (Compute Express Link) enables multiple hosts to share byte-addressable memory with hardware cache coherence, but no existing filesystem exploits this for lock-free multi-host coordination. We present DaxFS, a Linux filesystem for CXL…
Emerging Compute Express Link (CXL) enables cost-efficient memory expansion beyond the local DRAM of processors. While its CXL$.$mem protocol provides minimal latency overhead through an optimized protocol stack, frequent CXL memory…
Datacenter applications often rely on remote procedure calls (RPCs) for fast, efficient, and secure communication. However, RPCs are slow, inefficient, and hard to use as they require expensive serialization and compression to communicate…
Compute eXpress Link (CXL) is emerging as a promising memory interface technology. However, its performance characteristics remain largely unclear due to the limited availability of production hardware. Key questions include: What are the…