Related papers: A Framework for High-throughput Sequence Alignment…

Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture

Many modern workloads, such as neural networks, databases, and graph processing, are fundamentally memory-bound. For such workloads, the data movement between main memory and CPU cores imposes a significant overhead in terms of both latency…

Hardware Architecture · Computer Science 2022-05-06 Juan Gómez-Luna , Izzat El Hajj , Ivan Fernandez , Christina Giannoula , Geraldo F. Oliveira , Onur Mutlu

Benchmarking Memory-Centric Computing Systems: Analysis of Real Processing-in-Memory Hardware

Many modern workloads such as neural network inference and graph processing are fundamentally memory-bound. For such workloads, data movement between memory and CPU cores imposes a significant overhead in terms of both latency and energy. A…

Hardware Architecture · Computer Science 2023-04-04 Juan Gómez-Luna , Izzat El Hajj , Ivan Fernandez , Christina Giannoula , Geraldo F. Oliveira , Onur Mutlu

ALPHA-PIM: Analysis of Linear Algebraic Processing for High-Performance Graph Applications on a Real Processing-In-Memory System

Processing large-scale graph datasets is computationally intensive and time-consuming. Processor-centric CPU and GPU architectures, commonly used for graph applications, often face bottlenecks caused by extensive data movement between the…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-11 Marzieh Barkhordar , Alireza Tabatabaeian , Mohammad Sadrosadati , Christina Giannoula , Juan Gomez Luna , Izzat El Hajj , Onur Mutlu , Alaa R. Alameldeen

SimplePIM: A Software Framework for Productive and Efficient Processing-in-Memory

Data movement between memory and processors is a major bottleneck in modern computing systems. The processing-in-memory (PIM) paradigm aims to alleviate this bottleneck by performing computation inside memory chips. Real PIM hardware (e.g.,…

Hardware Architecture · Computer Science 2023-10-04 Jinfan Chen , Juan Gómez-Luna , Izzat El Hajj , Yuxin Guo , Onur Mutlu

A Customized Memory-aware Architecture for Biological Sequence Alignment

Sequence alignment is a fundamental process in computational biology which identifies regions of similarity in biological sequences. With the exponential growth in the volume of data in bioinformatics databases, the time, processing power,…

Hardware Architecture · Computer Science 2025-07-31 Nasrin Akbari , Mehdi Modarressi , Alireza Khadem

An Experimental Exploration of In-Memory Computing for Multi-Layer Perceptrons

In modern computer architectures, the performance of many memory-bound workloads (e.g., machine learning, graph processing, databases) is limited by the data movement bottleneck that emerges when transferring large amounts of data between…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-12 Pedro Carrinho , Hamid Moghadaspour , Oscar Ferraz , João Dinis Ferreira , Yann Falevoz , Vitor Silva , Gabriel Falcao

High-throughput Pairwise Alignment with the Wavefront Algorithm using Processing-in-Memory

We show that the wavefront algorithm can achieve higher pairwise read alignment throughput on a UPMEM PIM system than on a server-grade multi-threaded CPU system.

Hardware Architecture · Computer Science 2022-04-26 Safaa Diab , Amir Nassereldine , Mohammed Alser , Juan Gómez Luna , Onur Mutlu , Izzat El Hajj

Taking Cryptography Out of the Data Path via Near-Memory Processing in DRAM

Cryptographic algorithms such as AES-128 and SHA-256 are fundamental to ensuring data security and integrity. Although these algorithms are computationally efficient, their performance is often constrained by the processor-centric…

Cryptography and Security · Computer Science 2026-05-20 Nicola Barcarolo , Brahmaiah Gandham , Mohammad Sadrosadati , Roberto Passerone , Onur Mutlu , Flavio Vella

PAM: Processing Across Memory Hierarchy for Efficient KV-centric LLM Serving System

The widespread adoption of Large Language Models (LLMs) has exponentially increased the demand for efficient serving systems. With growing requests and context lengths, key-value (KV)-related operations, including attention computation and…

Hardware Architecture · Computer Science 2026-02-13 Lian Liu , Shixin Zhao , Yutian Zhou , Yintao He , Mengdi Wang , Yinhe Han , Ying Wang

Breaking Barriers: Maximizing Array Utilization for Compute In-Memory Fabrics

Compute in-memory (CIM) is a promising technique that minimizes data transport, the primary performance bottleneck and energy cost of most data intensive applications. This has found wide-spread adoption in accelerating neural networks for…

Hardware Architecture · Computer Science 2020-08-18 Brian Crafton , Samuel Spetalnick , Gauthaman Murali , Tushar Krishna , Sung-Kyu Lim , Arijit Raychowdhury

HE-PIM: Demystifying Homomorphic Operations on a Real-world Processing-in-Memory System

Homomorphic encryption (HE) enables computation over encrypted data, offering strong privacy guarantees for untrusted computing environments. Practical adoption remains limited by high computational complexity, large ciphertext sizes, and…

Cryptography and Security · Computer Science 2026-05-14 Harshita Gupta , Mayank Kabra , Jaewoo Park , Priyam Mehta , Phillip Widdowson , Tathagata Barik , Nisa Bostancı , Konstantinos Kanellopoulos , Juan Gómez-Luna , Antonio J. Peña , Mohammad Sadrosadati , Onur Mutlu

PIM-tree: A Skew-resistant Index for Processing-in-Memory

The performance of today's in-memory indexes is bottlenecked by the memory latency/bandwidth wall. Processing-in-memory (PIM) is an emerging approach that potentially mitigates this bottleneck, by enabling low-latency memory access whose…

Databases · Computer Science 2022-11-22 Hongbo Kang , Yiwei Zhao , Guy E. Blelloch , Laxman Dhulipala , Yan Gu , Charles McGuffey , Phillip B. Gibbons

Membrane: Accelerating Database Analytics with Bank-Level DRAM-PIM Filtering

In-memory database query processing frequently involves substantial data transfers between the CPU and memory, leading to inefficiencies due to Von Neumann bottleneck. Processing-in-Memory (PIM) architectures offer a viable solution to…

Hardware Architecture · Computer Science 2025-04-10 Akhil Shekar , Kevin Gaffney , Martin Prammer , Khyati Kiyawat , Lingxi Wu , Helena Caminal , Zhenxing Fan , Yimin Gao , Ashish Venkat , José F. Martínez , Jignesh Patel , Kevin Skadron

PIM-STM: Software Transactional Memory for Processing-In-Memory Systems

Processing-In-Memory (PIM) is a novel approach that augments existing DRAM memory chips with lightweight logic. By allowing to offload computations to the PIM system, this architecture allows for circumventing the data-bottleneck problem…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-18 André Lopes , Daniel Castro , Paolo Romano

An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System

Training machine learning (ML) algorithms is a computationally intensive process, which is frequently memory-bound due to repeatedly accessing large training datasets. As a result, processor-centric systems (e.g., CPU, GPU) suffer from…

Hardware Architecture · Computer Science 2023-09-07 Juan Gómez-Luna , Yuxin Guo , Sylvan Brocard , Julien Legriel , Remy Cimadomo , Geraldo F. Oliveira , Gagandeep Singh , Onur Mutlu

Enabling the Adoption of Processing-in-Memory: Challenges, Mechanisms, Future Research Directions

Poor DRAM technology scaling over the course of many years has caused DRAM-based main memory to increasingly become a larger system bottleneck. A major reason for the bottleneck is that data stored within DRAM must be moved across a…

Hardware Architecture · Computer Science 2018-02-02 Saugata Ghose , Kevin Hsieh , Amirali Boroumand , Rachata Ausavarungnirun , Onur Mutlu

Processing Data Where It Makes Sense: Enabling In-Memory Computation

Today's systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in systems that cause performance, scalability and energy bottlenecks: (1) data access from memory…

Hardware Architecture · Computer Science 2019-03-12 Onur Mutlu , Saugata Ghose , Juan Gómez-Luna , Rachata Ausavarungnirun

PIM-Opt: Demystifying Distributed Optimization Algorithms on a Real-World Processing-In-Memory System

Modern Machine Learning (ML) training on large-scale datasets is a very time-consuming workload. It relies on the optimization algorithm Stochastic Gradient Descent (SGD) due to its effectiveness, simplicity, and generalization performance.…

Hardware Architecture · Computer Science 2024-09-30 Steve Rhyner , Haocong Luo , Juan Gómez-Luna , Mohammad Sadrosadati , Jiawei Jiang , Ataberk Olgun , Harshita Gupta , Ce Zhang , Onur Mutlu

A Mess of Memory System Benchmarking, Simulation and Application Profiling

The Memory stress (Mess) framework provides a unified view of the memory system benchmarking, simulation and application profiling. The Mess benchmark provides a holistic and detailed memory system characterization. It is based on hundreds…

Hardware Architecture · Computer Science 2024-12-10 Pouya Esmaili-Dokht , Francesco Sgherzi , Valeria Soldera Girelli , Isaac Boixaderas , Mariana Carmin , Alireza Monemi , Adria Armejach , Estanislao Mercadal , German Llort , Petar Radojkovic , Miquel Moreto , Judit Gimenez , Xavier Martorell , Eduard Ayguade , Jesus Labarta , Emanuele Confalonieri , Rishabh Dubey , Jason Adlard

Continual Learning Approach for Improving the Data and Computation Mapping in Near-Memory Processing System

The resurgence of near-memory processing (NMP) with the advent of big data has shifted the computation paradigm from processor-centric to memory-centric computing. To meet the bandwidth and capacity demands of memory-centric computing, 3D…

Hardware Architecture · Computer Science 2021-04-29 Pritam Majumder , Jiayi Huang , Sungkeun Kim , Abdullah Muzahid , Dylan Siegers , Chia-Che Tsai , Eun Jung Kim