English
Related papers

Related papers: ALPHA-PIM: Analysis of Linear Algebraic Processing…

200 papers

Many modern workloads such as neural network inference and graph processing are fundamentally memory-bound. For such workloads, data movement between memory and CPU cores imposes a significant overhead in terms of both latency and energy. A…

Hardware Architecture · Computer Science 2023-04-04 Juan Gómez-Luna , Izzat El Hajj , Ivan Fernandez , Christina Giannoula , Geraldo F. Oliveira , Onur Mutlu

Many modern workloads, such as neural networks, databases, and graph processing, are fundamentally memory-bound. For such workloads, the data movement between main memory and CPU cores imposes a significant overhead in terms of both latency…

Hardware Architecture · Computer Science 2022-05-06 Juan Gómez-Luna , Izzat El Hajj , Ivan Fernandez , Christina Giannoula , Geraldo F. Oliveira , Onur Mutlu

In-memory database query processing frequently involves substantial data transfers between the CPU and memory, leading to inefficiencies due to Von Neumann bottleneck. Processing-in-Memory (PIM) architectures offer a viable solution to…

Processing-in-memory (PIM) has emerged as a promising solution for accelerating memory-intensive workloads as they provide high memory bandwidth to the processing units. This approach has drawn attention not only from the academic community…

Hardware Architecture · Computer Science 2024-09-11 Dongjae Lee , Bongjoon Hyun , Taehun Kim , Minsoo Rhu

Poor DRAM technology scaling over the course of many years has caused DRAM-based main memory to increasingly become a larger system bottleneck. A major reason for the bottleneck is that data stored within DRAM must be moved across a…

Hardware Architecture · Computer Science 2018-02-02 Saugata Ghose , Kevin Hsieh , Amirali Boroumand , Rachata Ausavarungnirun , Onur Mutlu

Cryptographic algorithms such as AES-128 and SHA-256 are fundamental to ensuring data security and integrity. Although these algorithms are computationally efficient, their performance is often constrained by the processor-centric…

Cryptography and Security · Computer Science 2026-05-20 Nicola Barcarolo , Brahmaiah Gandham , Mohammad Sadrosadati , Roberto Passerone , Onur Mutlu , Flavio Vella

Today's computing systems require moving data back-and-forth between computing resources (e.g., CPUs, GPUs, accelerators) and off-chip main memory so that computation can take place on the data. Unfortunately, this data movement is a major…

Hardware Architecture · Computer Science 2022-05-31 Geraldo F. Oliveira , Amirali Boroumand , Saugata Ghose , Juan Gómez-Luna , Onur Mutlu

Database Management Systems (DBMSs) are crucial for efficient data management and analytics, and are used in several different application domains. Due to the increasing volume of data a DBMS deals with, current processor-centric…

Hardware Architecture · Computer Science 2025-04-03 Manos Frouzakis , Juan Gómez-Luna , Geraldo F. Oliveira , Mohammad Sadrosadati , Onur Mutlu

The increasing prevalence and growing size of data in modern applications have led to high costs for computation in traditional processor-centric computing systems. Moving large volumes of data between memory devices (e.g., DRAM) and…

Hardware Architecture · Computer Science 2022-06-01 Geraldo F. Oliveira , Juan Gómez-Luna , Saugata Ghose , Onur Mutlu

This paper discusses recent research that aims to enable computation close to data, an approach we broadly call processing-in-memory (PIM). PIM places computation mechanisms in or near where the data is stored (i.e., inside memory chips or…

Hardware Architecture · Computer Science 2025-02-07 Onur Mutlu , Saugata Ghose , Juan Gómez-Luna , Rachata Ausavarungnirun , Mohammad Sadrosadati , Geraldo F. Oliveira

Graph mining applications, such as subgraph pattern matching and mining, are widely used in real-world domains such as bioinformatics, social network analysis, and computer vision. Such applications are considered a new class of…

Hardware Architecture · Computer Science 2023-06-21 Jiya Su , Peng Jiang , Rujia Wang

Modern Machine Learning (ML) training on large-scale datasets is a very time-consuming workload. It relies on the optimization algorithm Stochastic Gradient Descent (SGD) due to its effectiveness, simplicity, and generalization performance.…

Hardware Architecture · Computer Science 2024-09-30 Steve Rhyner , Haocong Luo , Juan Gómez-Luna , Mohammad Sadrosadati , Jiawei Jiang , Ataberk Olgun , Harshita Gupta , Ce Zhang , Onur Mutlu

Today's systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in systems that cause performance, scalability and energy bottlenecks: (1) data access from memory…

Hardware Architecture · Computer Science 2019-03-12 Onur Mutlu , Saugata Ghose , Juan Gómez-Luna , Rachata Ausavarungnirun

Training machine learning algorithms is a computationally intensive process, which is frequently memory-bound due to repeatedly accessing large training datasets. As a result, processor-centric systems (e.g., CPU, GPU) suffer from costly…

Hardware Architecture · Computer Science 2022-08-04 Juan Gómez-Luna , Yuxin Guo , Sylvan Brocard , Julien Legriel , Remy Cimadomo , Geraldo F. Oliveira , Gagandeep Singh , Onur Mutlu

Graph Neural Networks (GNNs) are emerging ML models to analyze graph-structure data. Graph Neural Network (GNN) execution involves both compute-intensive and memory-intensive kernels, the latter dominates the total time, being significantly…

Data movement between memory and processors is a major bottleneck in modern computing systems. The processing-in-memory (PIM) paradigm aims to alleviate this bottleneck by performing computation inside memory chips. Real PIM hardware (e.g.,…

Hardware Architecture · Computer Science 2023-10-04 Jinfan Chen , Juan Gómez-Luna , Izzat El Hajj , Yuxin Guo , Onur Mutlu

Processing-In-Memory (PIM) is a novel approach that augments existing DRAM memory chips with lightweight logic. By allowing to offload computations to the PIM system, this architecture allows for circumventing the data-bottleneck problem…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-18 André Lopes , Daniel Castro , Paolo Romano

Many modern and emerging applications must process increasingly large volumes of data. Unfortunately, prevalent computing paradigms are not designed to efficiently handle such large-scale data: the energy and performance costs to move this…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-07-31 Saugata Ghose , Amirali Boroumand , Jeremie S. Kim , Juan Gómez-Luna , Onur Mutlu

Processing-in-memory (PIM) architectures bring computation closer to data, reducing the processor-memory transfer bottleneck in traditional processor-centric designs. Novel hardware solutions, such as UPMEM's in-memory processing…

Emerging Technologies · Computer Science 2026-04-10 Peterson Yuhala , Mpoki Mwaisela , Pascal Felber , Valerio Schiavoni

Processing-in-Memory (PIM) enhances memory with computational capabilities, potentially solving energy and latency issues associated with data transfer between memory and processors. However, managing concurrent computation and data flow…

Hardware Architecture · Computer Science 2025-05-09 Ahmed Mamdouh , Haoran Geng , Michael Niemier , Xiaobo Sharon Hu , Dayane Reis
‹ Prev 1 2 3 10 Next ›