English
Related papers

Related papers: Offloading to CXL-based Computational Memory

200 papers

Upcoming CXL-based disaggregated memory devices feature special purpose units to offload compute to near-memory. In this paper, we explore opportunities for offloading compute to general purpose cores on CXL memory devices, thereby enabling…

Emerging Technologies · Computer Science 2024-04-04 Jon Hermes , Josh Minor , Minjun Wu , Adarsh Patil , Eric Van Hensbergen

Heterogeneous memory technologies are increasingly important instruments in addressing the memory wall in HPC systems. While most are deployed in single node setups, CXL.mem is a technology that implements memories that can be attached to…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-10 Stepan Vanecek , Matthew Turner , Manisha Gajbe , Matthew Wolf , Martin Schulz

CXL has been the emerging technology for expanding memory for both the host CPU and device accelerators with load/store interface. Extending memory coherency to the PCIe root complex makes the codesign more flexible in that you can access…

Hardware Architecture · Computer Science 2023-09-11 Yiwei Yang

CXL (Compute Express Link) is an emerging open industry-standard interconnect between processing and memory devices that is expected to revolutionize the way systems are designed. It enables cache-coherent, shared memory pools in a…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-27 Gal Assa , Moritz Lumme , Lucas Bürgi , Michal Friedman , Ori Lahav

MPI implementations commonly rely on explicit memory-copy operations, incurring overhead from redundant data movement and buffer management. This overhead notably impacts HPC workloads involving intensive inter-processor communication. In…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-17 Miryeong Kwon , Donghyun Gouk , Hyein Woo , Junhee Kim , Jinwoo Baek , Kyungkuk Nam , Sangyoon Ji , Jiseon Kim , Hanyeoreum Bae , Junhyeok Jang , Hyunwoo You , Junseok Moon , Myoungsoo Jung

Recent Serverless workloads tend to be largescaled/CPU-memory intensive, such as DL, graph applications, that require dynamic memory-to-compute resources provisioning. Meanwhile, recent solutions seek to design page management strategies…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-09-26 Yuze Li , Shunyu Yao

Processing-in-memory (PIM) reduces data movement by executing near memory, but our large-scale characterization on real PIM hardware shows that end-to-end performance is often limited by disjoint host and device address spaces that force…

Emerging Technologies · Computer Science 2025-11-20 I-Ting Lee , Bao-Kai Wang , Liang-Chi Chen , Wen Sheng Lim , Da-Wei Chang , Yu-Ming Chang , Chieng-Chung Ho

Interconnection is crucial for computing systems. However, the current interconnection performance between processors and devices, such as memory devices and accelerators, significantly lags behind their computing performance, severely…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-21 Chen Chen , Xinkui Zhao , Guanjie Cheng , Yuesheng Xu , Shuiguang Deng , Jianwei Yin

As multimodal and AI-driven services exchange hundreds of megabytes per request, existing IPC runtimes spend a growing share of CPU cycles on memory copies. Although both hardware and software mechanisms are exploring memory offloading,…

Operating Systems · Computer Science 2026-01-13 Misun Park , Richi Dubey , Yifan Yuan , Nam Sung Kim , Ada Gavrilovska

Compute Express Link (CXL) is a rapidly emerging coherent interconnect standard that provides opportunities for memory pooling and sharing. Memory sharing is a well-established software feature that improves memory utilization by avoiding…

Emerging Technologies · Computer Science 2024-04-05 Sunita Jain , Nagaradhesh Yeleswarapu , Hasan Al Maruf , Rita Gupta

CXLMemSim is a fast, lightweight simulation framework that enables performance characterization of memory systems based on Compute Express Link (CXL) .mem technology. CXL.mem allows disaggregation and pooling of memory to mitigate memory…

Performance · Computer Science 2025-06-18 Yiwei Yang , Brian Zhao , Yusheng Zheng , Pooneh Safayenikoo , Tanvir Ahmed Khan , Andi Quinn

Memory disaggregation is an emerging technology that decouples memory from traditional memory buses, enabling independent scaling of compute and memory. Compute Express Link (CXL), an open-standard interconnect technology, facilitates…

Hardware Architecture · Computer Science 2025-03-27 Yujie Yang , Lingfeng Xiang , Peiran Du , Zhen Lin , Weishu Deng , Ren Wang , Andrey Kudryavtsev , Louis Ko , Hui Lu , Jia Rao

In our exploration of Composable Memory systems utilizing CXL, we focus on overcoming adoption barriers at Hyperscale, underscored by economic models demonstrating Total Cost of Ownership (TCO). While CXL addresses the pressing memory…

Emerging Technologies · Computer Science 2024-04-05 Angelos Arelakis , Nilesh Shah , Yiannis Nikolakopoulos , Dimitrios Palyvos-Giannas

Large language models (LLMs) training or inference across multiple nodes introduces significant pressure on GPU memory and interconnect bandwidth. The Compute Express Link (CXL) shared memory pool offers a scalable solution by enabling…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-08 Dong Xu , Han Meng , Xinyu Chen , Dengcheng Zhu , Wei Tang , Fei Liu , Liguang Xie , Wu Xiang , Rui Shi , Yue Li , Henry Hu , Hui Zhang , Jianping Jiang , Dong Li

Large-scale AI training and inference require hundreds of gigabytes to terabytes of DRAM with high peak to average utilization ratios, resulting in overprovisioning. In cloud computing, DRAM constitutes a significant share of the cost. Yet,…

Hardware Architecture · Computer Science 2026-05-28 Kaustav Goswami , Maryam Babaie , Hoa Nguyen , Venkatesh Akella , Jason Lowe-Power

Contrast maximization (CMAX) is a direct geometric framework for event-based motion estimation, but its iterative warp-and-accumulate pipeline incurs input-dependent computation and frequent memory accesses, challenging real-time, low-power…

Hardware Architecture · Computer Science 2026-05-26 Kyeongpil Min , Jongin Choi , Kyeongwon Lee , Woojoo Lee

Compute eXpress Link (CXL) is emerging as a promising memory interface technology. However, its performance characteristics remain largely unclear due to the limited availability of production hardware. Key questions include: What are the…

Performance · Computer Science 2025-10-14 Xi Wang , Jie Liu , Jianbo Wu , Shuangyan Yang , Jie Ren , Bhanu Shankar , Dong Li

The growing prevalence of data-intensive workloads, such as artificial intelligence (AI), machine learning (ML), high-performance computing (HPC), in-memory databases, and real-time analytics, has exposed limitations in conventional memory…

In the landscape of High-Performance Computing (HPC), the quest for efficient and scalable memory solutions remains paramount. The advent of Compute Express Link (CXL) introduces a promising avenue with its potential to function as a…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-22 Yehonatan Fridman , Suprasad Mutalik Desai , Navneet Singh , Thomas Willhalm , Gal Oren

The expansion of context windows in large language models (LLMs) to multi-million tokens introduces severe memory and compute bottlenecks, particularly in managing the growing Key-Value (KV) cache. While Compute Express Link (CXL) enables…

‹ Prev 1 2 3 10 Next ›