Related papers: Asynchronous Memory Access Unit for General Purpos…

Asynchronous Memory Access Unit: Exploiting Massive Parallelism for Far Memory Access

The growing memory demands of modern applications have driven the adoption of far memory technologies in data centers to provide cost-effective, high-capacity memory solutions. However, far memory presents new performance challenges because…

Hardware Architecture · Computer Science 2024-04-18 Luming Wang , Xu Zhang , Songyue Wang , Zhuolun Jiang , Tianyue Lu , Mingyu Chen , Siwei Luo , Keji Huang

Systems for Memory Disaggregation: Challenges & Opportunities

Memory disaggregation addresses memory imbalance in a cluster by decoupling CPU and memory allocations of applications while also increasing the effective memory capacity for (memory-intensive) applications beyond the local memory limit…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-07 Anil Yelam

The Future of Memory: Limits and Opportunities

Memory latency, bandwidth, capacity, and energy increasingly limit performance. In this paper, we reconsider proposed system architectures that consist of huge (many-terabyte to petabyte scale) memories shared among large numbers of CPUs.…

Hardware Architecture · Computer Science 2025-09-24 Samuel Dayo , Shuhan Liu , Peijing Li , Philip Levis , Subhasish Mitra , Thierry Tambe , David Tennenhouse , H. -S. Philip Wong

To Use or Not to Use: CPUs' Cache Optimization Techniques on GPGPUs

General Purpose Graphic Processing Unit(GPGPU) is used widely for achieving high performance or high throughput in parallel programming. This capability of GPGPUs is very famous in the new era and mostly used for scientific computing which…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-10-10 Vajira Thambawita , Roshan G. Ragel , Dhammike Elkaduwe

Memory Disaggregation: Advances and Open Challenges

Compute and memory are tightly coupled within each server in traditional datacenters. Large-scale datacenter operators have identified this coupling as a root cause behind fleet-wide resource underutilization and increasing Total Cost of…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-09 Hasan Al Maruf , Mosharaf Chowdhury

Programmable FPGA-based Memory Controller

Even with generational improvements in DRAM technology, memory access latency still remains the major bottleneck for application accelerators, primarily due to limitations in memory interface IPs which cannot fully account for variations in…

Hardware Architecture · Computer Science 2021-08-24 Sasindu Wijeratne , Sanket Pattnaik , Zhiyu Chen , Rajgopal Kannan , Viktor Prasanna

Near-Memory Computing: Past, Present, and Future

The conventional approach of moving data to the CPU for computation has become a significant performance bottleneck for emerging scale-out data-intensive applications due to their limited data reuse. At the same time, the advancement in 3D…

Hardware Architecture · Computer Science 2019-08-08 Gagandeep Singh , Lorenzo Chelini , Stefano Corda , Ahsan Javed Awan , Sander Stuijk , Roel Jordans , Henk Corporaal , Albert-Jan Boonstra

Design and Evaluation of a Rack-Scale Disaggregated Memory Architecture For Data Centers

Memory disaggregation is being considered as a strong alternative to traditional architecture to deal with the memory under-utilization in data centers. Disaggregated memory can adapt to dynamically changing memory requirements for the data…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-11 Amit Puri , John Jose , Tamarapalli Venkatesh

Disaggregated and optically interconnected memory: when will it be cost effective?

The "Disaggregated Server" concept has been proposed for datacenters where the same type server resources are aggregated in their respective pools, for example a compute pool, memory pool, network pool, and a storage pool. Each server is…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-03-09 Bulent Abali , Richard J. Eickemeyer , Hubertus Franke , Chung-Sheng Li , Marc A. Taubenblatt

Demystifying Memory Access Patterns of FPGA-Based Graph Processing Accelerators

Recent advances in reprogrammable hardware (e.g., FPGAs) and memory technology (e.g., DDR4, HBM) promise to solve performance problems inherent to graph processing like irregular memory access patterns on traditional hardware (e.g., CPU).…

Hardware Architecture · Computer Science 2021-04-19 Jonas Dann , Daniel Ritter , Holger Fröning

A Case for Asymmetric Non-Volatile Memory Architecture

The byte-addressable Non-Volatile Memory (NVM) is a promising technology since it simultaneously provides DRAM-like performance, disk-like capacity, and persistency. The current NVM deployment is symmetric, where NVM devices are directly…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-31 Teng Ma , Mingxing Zhang , Kang Chen , Xuehai Qian , Yongwei Wu

Mainframe-Style Channel Controllers for Modern Disaggregated Memory Systems

Despite the promise of alleviating the main memory bottleneck, and the existence of commercial hardware implementations, techniques for Near-Data Processing have seen relatively little real-world deployment. The idea has received renewed…

Operating Systems · Computer Science 2025-09-08 Zikai Liu , Jasmin Schult , Pengcheng Xu , Timothy Roscoe

Exploring Memory Access Patterns for Graph Processing Accelerators

Recent trends in business and technology (e.g., machine learning, social network analysis) benefit from storing and processing growing amounts of graph-structured data in databases and data science platforms. FPGAs as accelerators for graph…

Databases · Computer Science 2021-02-09 Jonas Dann , Daniel Ritter , Holger Fröning

Advanced Programming Platform for efficient use of Data Parallel Hardware

Graphics processing units (GPU) had evolved from a specialized hardware capable to render high quality graphics in games to a commodity hardware for effective processing blocks of data in a parallel schema. This evolution is particularly…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-03-26 Luis Cabellos

In-memory Associative Processors: Tutorial, Potential, and Challenges

In-memory computing is an emerging computing paradigm that overcomes the limitations of exiting Von-Neumann computing architectures such as the memory-wall bottleneck. In such paradigm, the computations are performed directly on the data…

Emerging Technologies · Computer Science 2022-04-14 Mohammed E. Fouda , Hasan Erdem Yantir , Ahmed M. Eltawil , Fadi Kurdahi

Fast, Multicore-Scalable, Low-Fragmentation Memory Allocation through Large Virtual Memory and Global Data Structures

We demonstrate that general-purpose memory allocation involving many threads on many cores can be done with high performance, multicore scalability, and low memory consumption. For this purpose, we have designed and implemented scalloc, a…

Programming Languages · Computer Science 2015-08-26 Martin Aigner , Christoph M. Kirsch , Michael Lippautz , Ana Sokolova

Analysis of Interference between RDMA and Local Access on Hybrid Memory System

We can use a hybrid memory system consisting of DRAM and Intel Optane DC Persistent Memory (We call it DCPM in this paper) as DCPM is now commercially available since April 2019. Even if the latency for DCPM is several times higher than…

Performance · Computer Science 2020-08-31 Kazuichi Oe

Hardware Memory Management for Future Mobile Hybrid Memory Systems

The current mobile applications have rapidly growing memory footprints, posing a great challenge for memory system design. Insufficient DRAM main memory will incur frequent data swaps between memory and storage, a process that hurts…

Hardware Architecture · Computer Science 2024-03-19 Fei Wen , Mian Qin , Paul Gratz , Narasimha Reddy

Persistent Buffer Management with Optimistic Consistency

Finding the best way to leverage non-volatile memory (NVM) on modern database systems is still an open problem. The answer is far from trivial since the clear boundary between memory and storage present in most systems seems to be…

Databases · Computer Science 2019-08-21 Lucas Lersch , Wolfgang Lehner , Ismail Oukid

Towards Efficient OpenMP Strategies for Non-Uniform Architectures

Parallel processing is considered as todays and future trend for improving performance of computers. Computing devices ranging from small embedded systems to big clusters of computers rely on parallelizing applications to reduce execution…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-11-27 Oussama Tahan