Related papers: MIMS: Towards a Message Interface based Memory Sys…

PIM-MMU: A Memory Management Unit for Accelerating Data Transfers in Commercial PIM Systems

Processing-in-memory (PIM) has emerged as a promising solution for accelerating memory-intensive workloads as they provide high memory bandwidth to the processing units. This approach has drawn attention not only from the academic community…

Hardware Architecture · Computer Science 2024-09-11 Dongjae Lee , Bongjoon Hyun , Taehun Kim , Minsoo Rhu

Benchmarking Memory-Centric Computing Systems: Analysis of Real Processing-in-Memory Hardware

Many modern workloads such as neural network inference and graph processing are fundamentally memory-bound. For such workloads, data movement between memory and CPU cores imposes a significant overhead in terms of both latency and energy. A…

Hardware Architecture · Computer Science 2023-04-04 Juan Gómez-Luna , Izzat El Hajj , Ivan Fernandez , Christina Giannoula , Geraldo F. Oliveira , Onur Mutlu

Processing Data Where It Makes Sense: Enabling In-Memory Computation

Today's systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in systems that cause performance, scalability and energy bottlenecks: (1) data access from memory…

Hardware Architecture · Computer Science 2019-03-12 Onur Mutlu , Saugata Ghose , Juan Gómez-Luna , Rachata Ausavarungnirun

SimplePIM: A Software Framework for Productive and Efficient Processing-in-Memory

Data movement between memory and processors is a major bottleneck in modern computing systems. The processing-in-memory (PIM) paradigm aims to alleviate this bottleneck by performing computation inside memory chips. Real PIM hardware (e.g.,…

Hardware Architecture · Computer Science 2023-10-04 Jinfan Chen , Juan Gómez-Luna , Izzat El Hajj , Yuxin Guo , Onur Mutlu

Memos: Revisiting Hybrid Memory Management in Modern Operating System

The emerging hybrid DRAM-NVM architecture is challenging the existing memory management mechanism in operating system. In this paper, we introduce memos, which can schedule memory resources over the entire memory hierarchy including cache,…

Operating Systems · Computer Science 2017-03-23 Lei Liu , Mengyao Xie , Hao Yang

RIMMS: Runtime Integrated Memory Management System for Heterogeneous Computing

Efficient memory management in heterogeneous systems is increasingly challenging due to diverse compute architectures (e.g., CPU, GPU, FPGA) and dynamic task mappings not known at compile time. Existing approaches often require programmers…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-29 Serhan Gener , Aditya Ukarande , Shilpa Mysore Srinivasa Murthy , Sahil Hassan , Joshua Mack , Chaitali Chakrabarti , Umit Ogras , Ali Akoglu

High-Performance and Energy-Effcient Memory Scheduler Design for Heterogeneous Systems

When multiple processor cores (CPUs) and a GPU integrated together on the same chip share the off-chip DRAM, requests from the GPU can heavily interfere with requests from the CPUs, leading to low system performance and starvation of cores.…

Hardware Architecture · Computer Science 2018-05-01 Rachata Ausavarungnirun , Gabriel H. Loh , Lavanya Subramanian , Kevin Chang , Onur Mutlu

A Mess of Memory System Benchmarking, Simulation and Application Profiling

The Memory stress (Mess) framework provides a unified view of the memory system benchmarking, simulation and application profiling. The Mess benchmark provides a holistic and detailed memory system characterization. It is based on hundreds…

Hardware Architecture · Computer Science 2024-12-10 Pouya Esmaili-Dokht , Francesco Sgherzi , Valeria Soldera Girelli , Isaac Boixaderas , Mariana Carmin , Alireza Monemi , Adria Armejach , Estanislao Mercadal , German Llort , Petar Radojkovic , Miquel Moreto , Judit Gimenez , Xavier Martorell , Eduard Ayguade , Jesus Labarta , Emanuele Confalonieri , Rishabh Dubey , Jason Adlard

ALPHA-PIM: Analysis of Linear Algebraic Processing for High-Performance Graph Applications on a Real Processing-In-Memory System

Processing large-scale graph datasets is computationally intensive and time-consuming. Processor-centric CPU and GPU architectures, commonly used for graph applications, often face bottlenecks caused by extensive data movement between the…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-11 Marzieh Barkhordar , Alireza Tabatabaeian , Mohammad Sadrosadati , Christina Giannoula , Juan Gomez Luna , Izzat El Hajj , Onur Mutlu , Alaa R. Alameldeen

Optimizing and Exploring System Performance in Compact Processing-in-Memory-based Chips

Processing-in-memory (PIM) is a promising computing paradigm to tackle the "memory wall" challenge. However, PIM system-level benefits over traditional von Neumann architecture can be reduced when the memory array cannot fully store all the…

Hardware Architecture · Computer Science 2025-03-03 Peilin Chen , Xiaoxuan Yang

Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture

Many modern workloads, such as neural networks, databases, and graph processing, are fundamentally memory-bound. For such workloads, the data movement between main memory and CPU cores imposes a significant overhead in terms of both latency…

Hardware Architecture · Computer Science 2022-05-06 Juan Gómez-Luna , Izzat El Hajj , Ivan Fernandez , Christina Giannoula , Geraldo F. Oliveira , Onur Mutlu

Enabling the Adoption of Processing-in-Memory: Challenges, Mechanisms, Future Research Directions

Poor DRAM technology scaling over the course of many years has caused DRAM-based main memory to increasingly become a larger system bottleneck. A major reason for the bottleneck is that data stored within DRAM must be moved across a…

Hardware Architecture · Computer Science 2018-02-02 Saugata Ghose , Kevin Hsieh , Amirali Boroumand , Rachata Ausavarungnirun , Onur Mutlu

PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices

Recent dual in-line memory modules (DIMMs) are starting to support processing-in-memory (PIM) by associating their memory banks with processing elements (PEs), allowing applications to overcome the data movement bottleneck by offloading…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-16 Si Ung Noh , Junguk Hong , Chaemin Lim , Seongyeon Park , Jeehyun Kim , Hanjun Kim , Youngsok Kim , Jinho Lee

Hardware Memory Management for Future Mobile Hybrid Memory Systems

The current mobile applications have rapidly growing memory footprints, posing a great challenge for memory system design. Insufficient DRAM main memory will incur frequent data swaps between memory and storage, a process that hurts…

Hardware Architecture · Computer Science 2024-03-19 Fei Wen , Mian Qin , Paul Gratz , Narasimha Reddy

HH-PIM: Dynamic Optimization of Power and Performance with Heterogeneous-Hybrid PIM for Edge AI Devices

Processing-in-Memory (PIM) architectures offer promising solutions for efficiently handling AI applications in energy-constrained edge environments. While traditional PIM designs enhance performance and energy efficiency by reducing data…

Hardware Architecture · Computer Science 2025-12-09 Sangmin Jeon , Kangju Lee , Kyeongwon Lee , Woojoo Lee

PIMphony: Overcoming Bandwidth and Capacity Inefficiency in PIM-based Long-Context LLM Inference System

The expansion of long-context Large Language Models (LLMs) creates significant memory system challenges. While Processing-in-Memory (PIM) is a promising accelerator, we identify that it suffers from critical inefficiencies when scaled to…

Hardware Architecture · Computer Science 2025-12-29 Hyucksung Kwon , Kyungmo Koo , Janghyeon Kim , Woongkyu Lee , Minjae Lee , Gyeonggeun Jung , Hyungdeok Lee , Yousub Jung , Jaehan Park , Yosub Song , Byeongsu Yang , Haerang Choi , Guhyun Kim , Jongsoon Won , Woojae Shin , Changhyun Kim , Gyeongcheol Shin , Yongkee Kwon , Ilkon Kim , Euicheol Lim , John Kim , Jungwook Choi

Methodologies, Workloads, and Tools for Processing-in-Memory: Enabling the Adoption of Data-Centric Architectures

The increasing prevalence and growing size of data in modern applications have led to high costs for computation in traditional processor-centric computing systems. Moving large volumes of data between memory devices (e.g., DRAM) and…

Hardware Architecture · Computer Science 2022-06-01 Geraldo F. Oliveira , Juan Gómez-Luna , Saugata Ghose , Onur Mutlu

Energy-Efficient Wireless Interconnection Framework for Multichip Systems with In-package Memory Stacks

Multichip systems with memory stacks and various processing chips are at the heart of platform based designs such as servers and embedded systems. Full utilization of the benefits of these integrated multichip systems need a seamless, and…

Hardware Architecture · Computer Science 2017-09-25 Md Shahriar Shamim , M Meraj Ahmed , Naseef Mansoor , Amlan Ganguly

PIMSAB: A Processing-In-Memory System with Spatially-Aware Communication and Bit-Serial-Aware Computation

Bit-serial Processing-In-Memory (PIM) is an attractive paradigm for accelerator architectures, for parallel workloads such as Deep Learning (DL), because of its capability to achieve massive data parallelism at a low area overhead and…

Hardware Architecture · Computer Science 2023-11-21 Aman Arora , Jian Weng , Siyuan Ma , Tony Nowatzki , Lizy K. John

Understanding and Improving the Latency of DRAM-Based Memory Systems

Over the past two decades, the storage capacity and access bandwidth of main memory have improved tremendously, by 128x and 20x, respectively. These improvements are mainly due to the continuous technology scaling of DRAM (dynamic…

Hardware Architecture · Computer Science 2017-12-25 Kevin K. Chang