Related papers: Vector In Memory Architecture for simple and high …

A Modern Primer on Processing in Memory

This paper discusses recent research that aims to enable computation close to data, an approach we broadly call processing-in-memory (PIM). PIM places computation mechanisms in or near where the data is stored (i.e., inside memory chips or…

Hardware Architecture · Computer Science 2025-02-07 Onur Mutlu , Saugata Ghose , Juan Gómez-Luna , Rachata Ausavarungnirun , Mohammad Sadrosadati , Geraldo F. Oliveira

A Survey of Near-Data Processing Architectures for Neural Networks

Data-intensive workloads and applications, such as machine learning (ML), are fundamentally limited by traditional computing systems based on the von-Neumann architecture. As data movement operations and energy consumption become key…

Hardware Architecture · Computer Science 2021-12-24 Mehdi Hassanpour , Marc Riera , Antonio González

Vector-Vector-Matrix Architecture: A Novel Hardware-Aware Framework for Low-Latency Inference in NLP Applications

Deep neural networks have become the standard approach to building reliable Natural Language Processing (NLP) applications, ranging from Neural Machine Translation (NMT) to dialogue systems. However, improving accuracy by increasing the…

Computation and Language · Computer Science 2020-10-19 Matthew Khoury , Rumen Dangovski , Longwu Ou , Preslav Nakov , Yichen Shen , Li Jing

Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals

Processing-in-memory (PIM) architectures have demonstrated great potential in accelerating numerous deep learning tasks. Particularly, resistive random-access memory (RRAM) devices provide a promising hardware substrate to build PIM…

Hardware Architecture · Computer Science 2022-02-01 Weidong Cao , Yilong Zhao , Adith Boloor , Yinhe Han , Xuan Zhang , Li Jiang

Enabling Practical Processing in and near Memory for Data-Intensive Computing

Modern computing systems suffer from the dichotomy between computation on one side, which is performed only in the processor (and accelerators), and data storage/movement on the other, which all other parts of the system are dedicated to.…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-05-14 Onur Mutlu , Saugata Ghose , Juan Gómez-Luna , Rachata Ausavarungnirun

ConvPIM: Evaluating Digital Processing-in-Memory through Convolutional Neural Network Acceleration

Processing-in-memory (PIM) architectures are emerging to reduce data movement in data-intensive applications. These architectures seek to exploit the same physical devices for both information storage and logic, thereby dwarfing the…

Hardware Architecture · Computer Science 2023-05-09 Orian Leitersdorf , Ronny Ronen , Shahar Kvatinsky

Moving Processing to Data: On the Influence of Processing in Memory on Data Management

Near-Data Processing refers to an architectural hardware and software paradigm, based on the co-location of storage and compute units. Ideally, it will allow to execute application-defined data- or compute-intensive operations in-situ, i.e.…

Databases · Computer Science 2019-05-14 Tobias Vincon , Andreas Koch , Ilia Petrov

Adaptable Register File Organization for Vector Processors

Modern scientific applications are getting more diverse, and the vector lengths in those applications vary widely. Contemporary Vector Processors (VPs) are designed either for short vector lengths, e.g., Fujitsu A64FX with 512-bit ARM SVE…

Hardware Architecture · Computer Science 2022-05-31 Cristóbal Ramírez Lazo , Enrico Reggiani , Carlos Rojas Morales , Roger Figueras Bagué , Luis Alfonso Villa Vargas , Marco Antonio Ramírez Salinas , Mateo Valero Cortés , Osman Sabri Unsal , Adrián Cristal

Near-Memory Computing: Past, Present, and Future

The conventional approach of moving data to the CPU for computation has become a significant performance bottleneck for emerging scale-out data-intensive applications due to their limited data reuse. At the same time, the advancement in 3D…

Hardware Architecture · Computer Science 2019-08-08 Gagandeep Singh , Lorenzo Chelini , Stefano Corda , Ahsan Javed Awan , Sander Stuijk , Roel Jordans , Henk Corporaal , Albert-Jan Boonstra

NicePIM: Design Space Exploration for Processing-In-Memory DNN Accelerators with 3D-Stacked-DRAM

With the widespread use of deep neural networks(DNNs) in intelligent systems, DNN accelerators with high performance and energy efficiency are greatly demanded. As one of the feasible processing-in-memory(PIM) architectures,…

Hardware Architecture · Computer Science 2023-12-22 Junpeng Wang , Mengke Ge , Bo Ding , Qi Xu , Song Chen , Yi Kang

Methodologies, Workloads, and Tools for Processing-in-Memory: Enabling the Adoption of Data-Centric Architectures

The increasing prevalence and growing size of data in modern applications have led to high costs for computation in traditional processor-centric computing systems. Moving large volumes of data between memory devices (e.g., DRAM) and…

Hardware Architecture · Computer Science 2022-06-01 Geraldo F. Oliveira , Juan Gómez-Luna , Saugata Ghose , Onur Mutlu

Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud

Neural networks (NNs) are growing in importance and complexity. A neural network's performance (and energy efficiency) can be bound either by computation or memory resources. The processing-in-memory (PIM) paradigm, where computation is…

Hardware Architecture · Computer Science 2023-03-28 Geraldo F. Oliveira , Juan Gómez-Luna , Saugata Ghose , Amirali Boroumand , Onur Mutlu

Enabling the Adoption of Processing-in-Memory: Challenges, Mechanisms, Future Research Directions

Poor DRAM technology scaling over the course of many years has caused DRAM-based main memory to increasingly become a larger system bottleneck. A major reason for the bottleneck is that data stored within DRAM must be moved across a…

Hardware Architecture · Computer Science 2018-02-02 Saugata Ghose , Kevin Hsieh , Amirali Boroumand , Rachata Ausavarungnirun , Onur Mutlu

Proxics: an efficient programming model for far memory accelerators

The use of disaggregated or far memory systems such as CXL memory pools has renewed interest in Near-Data Processing (NDP): situating cores close to memory to reduce bandwidth requirements to and from the CPU. Hardware designs for such…

Operating Systems · Computer Science 2026-04-21 Zikai Liu , Niels Pressel , Jasmin Schult , Roman Meier , Pengcheng Xu , Timothy Roscoe

OPIMA: Optical Processing-In-Memory for Convolutional Neural Network Acceleration

Recent advances in machine learning (ML) have spotlighted the pressing need for computing architectures that bridge the gap between memory bandwidth and processing power. The advent of deep neural networks has pushed traditional Von Neumann…

Hardware Architecture · Computer Science 2024-07-12 Febin Sunny , Amin Shafiee , Abhishek Balasubramaniam , Mahdi Nikdast , Sudeep Pasricha

Multiplier-free In-Memory Vector-Matrix Multiplication Using Distributed Arithmetic

Vector-Matrix Multiplication (VMM) is the fundamental and frequently required computation in inference of Neural Networks (NN). Due to the large data movement required during inference, VMM can benefit greatly from in-memory computing.…

Hardware Architecture · Computer Science 2025-10-03 Felix Zeller , John Reuben , Dietmar Fey

Exploiting long vectors with a CFD code: a co-design show case

A current trend in HPC systems is the utilization of architectures with SIMD or vector extensions to exploit data parallelism. There are several ways to take advantage of such modern vector architectures, each with a different impact on the…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-05 Marc Blancafort , Roger Ferrer , Guillaume Houzeaux , Marta Garcia-Gasulla , Filippo Mantovani

Processing Data Where It Makes Sense: Enabling In-Memory Computation

Today's systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in systems that cause performance, scalability and energy bottlenecks: (1) data access from memory…

Hardware Architecture · Computer Science 2019-03-12 Onur Mutlu , Saugata Ghose , Juan Gómez-Luna , Rachata Ausavarungnirun

Processor in Non-Volatile Memory (PiNVSM): Towards to Data-centric Computing in Decentralized Environment

The AI problem has no solution in the environment of existing hardware stack and OS architecture. CPU-centric model of computation has a huge number of drawbacks that originate from memory hierarchy and obsolete architecture of the…

Operating Systems · Computer Science 2019-03-20 Viacheslav Dubeyko

ARCANE: Adaptive RISC-V Cache Architecture for Near-memory Extensions

Modern data-driven applications expose limitations of von Neumann architectures - extensive data movement, low throughput, and poor energy efficiency. Accelerators improve performance but lack flexibility and require data transfers.…

Hardware Architecture · Computer Science 2025-04-09 Vincenzo Petrolo , Flavia Guella , Michele Caon , Pasquale Davide Schiavone , Guido Masera , Maurizio Martina