Related papers: Retrospective: A Scalable Processing-in-Memory Acc…

PIMSAB: A Processing-In-Memory System with Spatially-Aware Communication and Bit-Serial-Aware Computation

Bit-serial Processing-In-Memory (PIM) is an attractive paradigm for accelerator architectures, for parallel workloads such as Deep Learning (DL), because of its capability to achieve massive data parallelism at a low area overhead and…

Hardware Architecture · Computer Science 2023-11-21 Aman Arora , Jian Weng , Siyuan Ma , Tony Nowatzki , Lizy K. John

A Modern Primer on Processing in Memory

This paper discusses recent research that aims to enable computation close to data, an approach we broadly call processing-in-memory (PIM). PIM places computation mechanisms in or near where the data is stored (i.e., inside memory chips or…

Hardware Architecture · Computer Science 2025-02-07 Onur Mutlu , Saugata Ghose , Juan Gómez-Luna , Rachata Ausavarungnirun , Mohammad Sadrosadati , Geraldo F. Oliveira

ALPHA-PIM: Analysis of Linear Algebraic Processing for High-Performance Graph Applications on a Real Processing-In-Memory System

Processing large-scale graph datasets is computationally intensive and time-consuming. Processor-centric CPU and GPU architectures, commonly used for graph applications, often face bottlenecks caused by extensive data movement between the…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-11 Marzieh Barkhordar , Alireza Tabatabaeian , Mohammad Sadrosadati , Christina Giannoula , Juan Gomez Luna , Izzat El Hajj , Onur Mutlu , Alaa R. Alameldeen

SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems

Simple graph algorithms such as PageRank have been the target of numerous hardware accelerators. Yet, there also exist much more complex graph mining algorithms for problems such as clustering or maximal clique listing. These algorithms are…

Hardware Architecture · Computer Science 2021-10-26 Maciej Besta , Raghavendra Kanakagiri , Grzegorz Kwasniewski , Rachata Ausavarungnirun , Jakub Beránek , Konstantinos Kanellopoulos , Kacper Janda , Zur Vonarburg-Shmaria , Lukas Gianinazzi , Ioana Stefan , Juan Gómez Luna , Marcin Copik , Lukas Kapp-Schwoerer , Salvatore Di Girolamo , Marek Konieczny , Nils Blach , Onur Mutlu , Torsten Hoefler

PIMSIM-NN: An ISA-based Simulation Framework for Processing-in-Memory Accelerators

Processing-in-memory (PIM) has shown extraordinary potential in accelerating neural networks. To evaluate the performance of PIM accelerators, we present an ISA-based simulation framework including a dedicated ISA targeting neural networks…

Hardware Architecture · Computer Science 2024-02-29 Xinyu Wang , Xiaotian Sun , Yinhe Han , Xiaoming Chen

Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals

Processing-in-memory (PIM) architectures have demonstrated great potential in accelerating numerous deep learning tasks. Particularly, resistive random-access memory (RRAM) devices provide a promising hardware substrate to build PIM…

Hardware Architecture · Computer Science 2022-02-01 Weidong Cao , Yilong Zhao , Adith Boloor , Yinhe Han , Xuan Zhang , Li Jiang

TCIM: Triangle Counting Acceleration With Processing-In-MRAM Architecture

Triangle counting (TC) is a fundamental problem in graph analysis and has found numerous applications, which motivates many TC acceleration solutions in the traditional computing platforms like GPU and FPGA. However, these approaches suffer…

Hardware Architecture · Computer Science 2020-07-22 Xueyan Wang , Jianlei Yang , Yinglin Zhao , Yingjie Qi , Meichen Liu , Xingzhou Cheng , Xiaotao Jia , Xiaoming Chen , Gang Qu , Weisheng Zhao

Taking Cryptography Out of the Data Path via Near-Memory Processing in DRAM

Cryptographic algorithms such as AES-128 and SHA-256 are fundamental to ensuring data security and integrity. Although these algorithms are computationally efficient, their performance is often constrained by the processor-centric…

Cryptography and Security · Computer Science 2026-05-20 Nicola Barcarolo , Brahmaiah Gandham , Mohammad Sadrosadati , Roberto Passerone , Onur Mutlu , Flavio Vella

Piccolo: Large-Scale Graph Processing with Fine-Grained In-Memory Scatter-Gather

Graph processing requires irregular, fine-grained random access patterns incompatible with contemporary off-chip memory architecture, leading to inefficient data access. This inefficiency makes graph processing an extremely memory-bound…

Hardware Architecture · Computer Science 2025-03-11 Changmin Shin , Jaeyong Song , Hongsun Jang , Dogeun Kim , Jun Sung , Taehee Kwon , Jae Hyung Ju , Frank Liu , Yeonkyu Choi , Jinho Lee

RAPID-Graph: Recursive All-Pairs Shortest Paths Using Processing-in-Memory for Dynamic Programming on Graphs

All-pairs shortest paths (APSP) remains a major bottleneck for large-scale graph analytics, as data movement with cubic complexity overwhelms the bandwidth of conventional memory hierarchies. In this work, we propose RAPID-Graph to address…

Hardware Architecture · Computer Science 2026-01-29 Yanru Chen , Zheyu Li , Keming Fan , Runyang Tian , John Hsu , Weihong Xu , Minxuan Zhou , Tajana Rosing

PPAC: A Versatile In-Memory Accelerator for Matrix-Vector-Product-Like Operations

Processing in memory (PIM) moves computation into memories with the goal of improving throughput and energy-efficiency compared to traditional von Neumann-based architectures. Most existing PIM architectures are either general-purpose but…

Hardware Architecture · Computer Science 2019-07-23 Oscar Castañeda , Maria Bobbett , Alexandra Gallyas-Sanhueza , Christoph Studer

Benchmarking Memory-Centric Computing Systems: Analysis of Real Processing-in-Memory Hardware

Many modern workloads such as neural network inference and graph processing are fundamentally memory-bound. For such workloads, data movement between memory and CPU cores imposes a significant overhead in terms of both latency and energy. A…

Hardware Architecture · Computer Science 2023-04-04 Juan Gómez-Luna , Izzat El Hajj , Ivan Fernandez , Christina Giannoula , Geraldo F. Oliveira , Onur Mutlu

PIM-malloc: A Fast and Scalable Dynamic Memory Allocator for Processing-In-Memory (PIM) Architectures

The ability to dynamically allocate memory is fundamental in modern programming languages. However, this feature is not adequately supported in current general-purpose PIM devices. To identify key design principles that PIM must consider,…

Hardware Architecture · Computer Science 2026-01-28 Dongjae Lee , Bongjoon Hyun , Youngjin Kwon , Minsoo Rhu

Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture

Many modern workloads, such as neural networks, databases, and graph processing, are fundamentally memory-bound. For such workloads, the data movement between main memory and CPU cores imposes a significant overhead in terms of both latency…

Hardware Architecture · Computer Science 2022-05-06 Juan Gómez-Luna , Izzat El Hajj , Ivan Fernandez , Christina Giannoula , Geraldo F. Oliveira , Onur Mutlu

CLSA-CIM: A Cross-Layer Scheduling Approach for Computing-in-Memory Architectures

The demand for efficient machine learning (ML) accelerators is growing rapidly, driving the development of novel computing concepts such as resistive random access memory (RRAM)-based tiled computing-in-memory (CIM) architectures. CIM…

Hardware Architecture · Computer Science 2024-01-18 Rebecca Pelke , Jose Cubero-Cascante , Nils Bosbach , Felix Staudigl , Rainer Leupers , Jan Moritz Joseph

PIM-GPT: A Hybrid Process-in-Memory Accelerator for Autoregressive Transformers

Decoder-only Transformer models such as GPT have demonstrated exceptional performance in text generation, by autoregressively predicting the next token. However, the efficacy of running GPT on current hardware systems is bounded by low…

Hardware Architecture · Computer Science 2024-04-16 Yuting Wu , Ziyu Wang , Wei D. Lu

Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud

Neural networks (NNs) are growing in importance and complexity. A neural network's performance (and energy efficiency) can be bound either by computation or memory resources. The processing-in-memory (PIM) paradigm, where computation is…

Hardware Architecture · Computer Science 2023-03-28 Geraldo F. Oliveira , Juan Gómez-Luna , Saugata Ghose , Amirali Boroumand , Onur Mutlu

RACAM: Enhancing DRAM with Reuse-Aware Computation and Automated Mapping for ML Inference

In-DRAM Processing-In-Memory (DRAM-PIM) has emerged as a promising approach to accelerate memory-intensive workloads by mitigating data transfer overhead between DRAM and the host processor. Bit-serial DRAM-PIM architectures, further…

Hardware Architecture · Computer Science 2025-12-11 Siyuan Ma , Jiajun Hu , Jeeho Ryoo , Aman Arora , Lizy Kurian John

A Workload and Programming Ease Driven Perspective of Processing-in-Memory

Many modern and emerging applications must process increasingly large volumes of data. Unfortunately, prevalent computing paradigms are not designed to efficiently handle such large-scale data: the energy and performance costs to move this…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-07-31 Saugata Ghose , Amirali Boroumand , Jeremie S. Kim , Juan Gómez-Luna , Onur Mutlu

PIM-STM: Software Transactional Memory for Processing-In-Memory Systems

Processing-In-Memory (PIM) is a novel approach that augments existing DRAM memory chips with lightweight logic. By allowing to offload computations to the PIM system, this architecture allows for circumventing the data-bottleneck problem…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-18 André Lopes , Daniel Castro , Paolo Romano