Related papers: Offloading to CXL-based Computational Memory
Upcoming CXL-based disaggregated memory devices feature special purpose units to offload compute to near-memory. In this paper, we explore opportunities for offloading compute to general purpose cores on CXL memory devices, thereby enabling…
Heterogeneous memory technologies are increasingly important instruments in addressing the memory wall in HPC systems. While most are deployed in single node setups, CXL.mem is a technology that implements memories that can be attached to…
CXL has been the emerging technology for expanding memory for both the host CPU and device accelerators with load/store interface. Extending memory coherency to the PCIe root complex makes the codesign more flexible in that you can access…
CXL (Compute Express Link) is an emerging open industry-standard interconnect between processing and memory devices that is expected to revolutionize the way systems are designed. It enables cache-coherent, shared memory pools in a…
MPI implementations commonly rely on explicit memory-copy operations, incurring overhead from redundant data movement and buffer management. This overhead notably impacts HPC workloads involving intensive inter-processor communication. In…
Recent Serverless workloads tend to be largescaled/CPU-memory intensive, such as DL, graph applications, that require dynamic memory-to-compute resources provisioning. Meanwhile, recent solutions seek to design page management strategies…
Processing-in-memory (PIM) reduces data movement by executing near memory, but our large-scale characterization on real PIM hardware shows that end-to-end performance is often limited by disjoint host and device address spaces that force…
Interconnection is crucial for computing systems. However, the current interconnection performance between processors and devices, such as memory devices and accelerators, significantly lags behind their computing performance, severely…
As multimodal and AI-driven services exchange hundreds of megabytes per request, existing IPC runtimes spend a growing share of CPU cycles on memory copies. Although both hardware and software mechanisms are exploring memory offloading,…
Compute Express Link (CXL) is a rapidly emerging coherent interconnect standard that provides opportunities for memory pooling and sharing. Memory sharing is a well-established software feature that improves memory utilization by avoiding…
CXLMemSim is a fast, lightweight simulation framework that enables performance characterization of memory systems based on Compute Express Link (CXL) .mem technology. CXL.mem allows disaggregation and pooling of memory to mitigate memory…
Memory disaggregation is an emerging technology that decouples memory from traditional memory buses, enabling independent scaling of compute and memory. Compute Express Link (CXL), an open-standard interconnect technology, facilitates…
In our exploration of Composable Memory systems utilizing CXL, we focus on overcoming adoption barriers at Hyperscale, underscored by economic models demonstrating Total Cost of Ownership (TCO). While CXL addresses the pressing memory…
Large language models (LLMs) training or inference across multiple nodes introduces significant pressure on GPU memory and interconnect bandwidth. The Compute Express Link (CXL) shared memory pool offers a scalable solution by enabling…
Large-scale AI training and inference require hundreds of gigabytes to terabytes of DRAM with high peak to average utilization ratios, resulting in overprovisioning. In cloud computing, DRAM constitutes a significant share of the cost. Yet,…
Contrast maximization (CMAX) is a direct geometric framework for event-based motion estimation, but its iterative warp-and-accumulate pipeline incurs input-dependent computation and frequent memory accesses, challenging real-time, low-power…
Compute eXpress Link (CXL) is emerging as a promising memory interface technology. However, its performance characteristics remain largely unclear due to the limited availability of production hardware. Key questions include: What are the…
The growing prevalence of data-intensive workloads, such as artificial intelligence (AI), machine learning (ML), high-performance computing (HPC), in-memory databases, and real-time analytics, has exposed limitations in conventional memory…
In the landscape of High-Performance Computing (HPC), the quest for efficient and scalable memory solutions remains paramount. The advent of Compute Express Link (CXL) introduces a promising avenue with its potential to function as a…
The expansion of context windows in large language models (LLMs) to multi-million tokens introduces severe memory and compute bottlenecks, particularly in managing the growing Key-Value (KV) cache. While Compute Express Link (CXL) enables…