Related papers: ATiM: Autotuning Tensor Programs for Processing-in…

SimplePIM: A Software Framework for Productive and Efficient Processing-in-Memory

Data movement between memory and processors is a major bottleneck in modern computing systems. The processing-in-memory (PIM) paradigm aims to alleviate this bottleneck by performing computation inside memory chips. Real PIM hardware (e.g.,…

Hardware Architecture · Computer Science 2023-10-04 Jinfan Chen , Juan Gómez-Luna , Izzat El Hajj , Yuxin Guo , Onur Mutlu

PIM-MMU: A Memory Management Unit for Accelerating Data Transfers in Commercial PIM Systems

Processing-in-memory (PIM) has emerged as a promising solution for accelerating memory-intensive workloads as they provide high memory bandwidth to the processing units. This approach has drawn attention not only from the academic community…

Hardware Architecture · Computer Science 2024-09-11 Dongjae Lee , Bongjoon Hyun , Taehun Kim , Minsoo Rhu

PIM-GPT: A Hybrid Process-in-Memory Accelerator for Autoregressive Transformers

Decoder-only Transformer models such as GPT have demonstrated exceptional performance in text generation, by autoregressively predicting the next token. However, the efficacy of running GPT on current hardware systems is bounded by low…

Hardware Architecture · Computer Science 2024-04-16 Yuting Wu , Ziyu Wang , Wei D. Lu

PIM-STM: Software Transactional Memory for Processing-In-Memory Systems

Processing-In-Memory (PIM) is a novel approach that augments existing DRAM memory chips with lightweight logic. By allowing to offload computations to the PIM system, this architecture allows for circumventing the data-bottleneck problem…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-18 André Lopes , Daniel Castro , Paolo Romano

UMDAM: A Unified Data Layout and DRAM Address Mapping for Heterogenous NPU-PIM

Large Language Models (LLMs) are increasingly deployed on edge devices with Neural Processing Units (NPUs), yet the decode phase remains memory-intensive, limiting performance. Processing-in-Memory (PIM) offers a promising solution, but…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-18 Hai Huang

HPIM: Heterogeneous Processing-In-Memory-based Accelerator for Large Language Models Inference

The deployment of large language models (LLMs) presents significant challenges due to their enormous memory footprints, low arithmetic intensity, and stringent latency requirements, particularly during the autoregressive decoding stage.…

Hardware Architecture · Computer Science 2025-11-03 Cenlin Duan , Jianlei Yang , Rubing Yang , Yikun Wang , Yiou Wang , Lingkun Long , Yingjie Qi , Xiaolin He , Ao Zhou , Xueyan Wang , Weisheng Zhao

NTT-PIM: Row-Centric Architecture and Mapping for Efficient Number-Theoretic Transform on PIM

Recently DRAM-based PIMs (processing-in-memories) with unmodified cell arrays have demonstrated impressive performance for accelerating AI applications. However, due to the very restrictive hardware constraints, PIM remains an accelerator…

Hardware Architecture · Computer Science 2023-10-17 Jaewoo Park , Sugil Lee , Jongeun Lee

UPMEM Unleashed: Software Secrets for Speed

Developing kernels for Processing-In-Memory (PIM) platforms poses unique challenges in data management and parallel programming on limited processing units. Although software development kits (SDKs) for PIM, such as the UPMEM SDK, provide…

Hardware Architecture · Computer Science 2025-10-21 Krystian Chmielewski , Jarosław Ławnicki , Uladzislau Lukyanau , Tadeusz Kobus , Maciej Maciejewski

Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals

Processing-in-memory (PIM) architectures have demonstrated great potential in accelerating numerous deep learning tasks. Particularly, resistive random-access memory (RRAM) devices provide a promising hardware substrate to build PIM…

Hardware Architecture · Computer Science 2022-02-01 Weidong Cao , Yilong Zhao , Adith Boloor , Yinhe Han , Xuan Zhang , Li Jiang

APT-LLM: Exploiting Arbitrary-Precision Tensor Core Computing for LLM Acceleration

Large language models (LLMs) have revolutionized AI applications, yet their enormous computational demands severely limit deployment and real-time performance. Quantization methods can help reduce computational costs, however, attaining the…

Machine Learning · Computer Science 2025-09-03 Shaobo Ma , Chao Fang , Haikuo Shao , Zhongfeng Wang

AutoRAC: Automated Processing-in-Memory Accelerator Design for Recommender Systems

The performance bottleneck of deep-learning-based recommender systems resides in their backbone Deep Neural Networks. By integrating Processing-In-Memory~(PIM) architectures, researchers can reduce data movement and enhance energy…

Hardware Architecture · Computer Science 2025-05-19 Feng Cheng , Tunhou Zhang , Junyao Zhang , Jonathan Hao-Cheng Ku , Yitu Wang , Xiaoxuan Yang , Hai , Li , Yiran Chen

Taking Cryptography Out of the Data Path via Near-Memory Processing in DRAM

Cryptographic algorithms such as AES-128 and SHA-256 are fundamental to ensuring data security and integrity. Although these algorithms are computationally efficient, their performance is often constrained by the processor-centric…

Cryptography and Security · Computer Science 2026-05-20 Nicola Barcarolo , Brahmaiah Gandham , Mohammad Sadrosadati , Roberto Passerone , Onur Mutlu , Flavio Vella

PRISM: Processing-In-Memory Sparse MTTKRP for Tensor Decomposition Acceleration

Sparse tensors are the most used representation of sparse multidimensional data. Operations that decompose them, selecting their most important features while reducing their dimension, have become prevalent procedures in machine learning.…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Daniel Pacheco , Leonel Sousa , Aleksandar Ilic

A$^3$PIM: An Automated, Analytic and Accurate Processing-in-Memory Offloader

The performance gap between memory and processor has grown rapidly. Consequently, the energy and wall-clock time costs associated with moving data between the CPU and main memory predominate the overall computational cost. The…

Hardware Architecture · Computer Science 2024-03-01 Qingcai Jiang , Shaojie Tan , Junshi Chen , Hong An

PIM-AI: A Novel Architecture for High-Efficiency LLM Inference

Large Language Models (LLMs) have become essential in a variety of applications due to their advanced language understanding and generation capabilities. However, their computational and memory requirements pose significant challenges to…

Hardware Architecture · Computer Science 2024-12-02 Cristobal Ortega , Yann Falevoz , Renaud Ayrignac

RACAM: Enhancing DRAM with Reuse-Aware Computation and Automated Mapping for ML Inference

In-DRAM Processing-In-Memory (DRAM-PIM) has emerged as a promising approach to accelerate memory-intensive workloads by mitigating data transfer overhead between DRAM and the host processor. Bit-serial DRAM-PIM architectures, further…

Hardware Architecture · Computer Science 2025-12-11 Siyuan Ma , Jiajun Hu , Jeeho Ryoo , Aman Arora , Lizy Kurian John

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new…

Machine Learning · Computer Science 2018-10-09 Tianqi Chen , Thierry Moreau , Ziheng Jiang , Lianmin Zheng , Eddie Yan , Meghan Cowan , Haichen Shen , Leyuan Wang , Yuwei Hu , Luis Ceze , Carlos Guestrin , Arvind Krishnamurthy

FAT-PIM: Low-Cost Error Detection for Processing-In-Memory

Processing In Memory (PIM) accelerators are promising architecture that can provide massive parallelization and high efficiency in various applications. Such architectures can instantaneously provide ultra-fast operation over extensive…

Hardware Architecture · Computer Science 2022-07-26 Kazi Abu Zubair , Sumit Kumar Jha , David Mohaisen , Clayton Hughes , Amro Awad

Membrane: Accelerating Database Analytics with Bank-Level DRAM-PIM Filtering

In-memory database query processing frequently involves substantial data transfers between the CPU and memory, leading to inefficiencies due to Von Neumann bottleneck. Processing-in-Memory (PIM) architectures offer a viable solution to…

Hardware Architecture · Computer Science 2025-04-10 Akhil Shekar , Kevin Gaffney , Martin Prammer , Khyati Kiyawat , Lingxi Wu , Helena Caminal , Zhenxing Fan , Yimin Gao , Ashish Venkat , José F. Martínez , Jignesh Patel , Kevin Skadron

Continual Learning Approach for Improving the Data and Computation Mapping in Near-Memory Processing System

The resurgence of near-memory processing (NMP) with the advent of big data has shifted the computation paradigm from processor-centric to memory-centric computing. To meet the bandwidth and capacity demands of memory-centric computing, 3D…

Hardware Architecture · Computer Science 2021-04-29 Pritam Majumder , Jiayi Huang , Sungkeun Kim , Abdullah Muzahid , Dylan Siegers , Chia-Che Tsai , Eun Jung Kim