English
Related papers

Related papers: ATiM: Autotuning Tensor Programs for Processing-in…

200 papers

Data movement between memory and processors is a major bottleneck in modern computing systems. The processing-in-memory (PIM) paradigm aims to alleviate this bottleneck by performing computation inside memory chips. Real PIM hardware (e.g.,…

Hardware Architecture · Computer Science 2023-10-04 Jinfan Chen , Juan Gómez-Luna , Izzat El Hajj , Yuxin Guo , Onur Mutlu

Processing-in-memory (PIM) has emerged as a promising solution for accelerating memory-intensive workloads as they provide high memory bandwidth to the processing units. This approach has drawn attention not only from the academic community…

Hardware Architecture · Computer Science 2024-09-11 Dongjae Lee , Bongjoon Hyun , Taehun Kim , Minsoo Rhu

Decoder-only Transformer models such as GPT have demonstrated exceptional performance in text generation, by autoregressively predicting the next token. However, the efficacy of running GPT on current hardware systems is bounded by low…

Hardware Architecture · Computer Science 2024-04-16 Yuting Wu , Ziyu Wang , Wei D. Lu

Processing-In-Memory (PIM) is a novel approach that augments existing DRAM memory chips with lightweight logic. By allowing to offload computations to the PIM system, this architecture allows for circumventing the data-bottleneck problem…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-18 André Lopes , Daniel Castro , Paolo Romano

Large Language Models (LLMs) are increasingly deployed on edge devices with Neural Processing Units (NPUs), yet the decode phase remains memory-intensive, limiting performance. Processing-in-Memory (PIM) offers a promising solution, but…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-18 Hai Huang

The deployment of large language models (LLMs) presents significant challenges due to their enormous memory footprints, low arithmetic intensity, and stringent latency requirements, particularly during the autoregressive decoding stage.…

Hardware Architecture · Computer Science 2025-11-03 Cenlin Duan , Jianlei Yang , Rubing Yang , Yikun Wang , Yiou Wang , Lingkun Long , Yingjie Qi , Xiaolin He , Ao Zhou , Xueyan Wang , Weisheng Zhao

Recently DRAM-based PIMs (processing-in-memories) with unmodified cell arrays have demonstrated impressive performance for accelerating AI applications. However, due to the very restrictive hardware constraints, PIM remains an accelerator…

Hardware Architecture · Computer Science 2023-10-17 Jaewoo Park , Sugil Lee , Jongeun Lee

Developing kernels for Processing-In-Memory (PIM) platforms poses unique challenges in data management and parallel programming on limited processing units. Although software development kits (SDKs) for PIM, such as the UPMEM SDK, provide…

Hardware Architecture · Computer Science 2025-10-21 Krystian Chmielewski , Jarosław Ławnicki , Uladzislau Lukyanau , Tadeusz Kobus , Maciej Maciejewski

Processing-in-memory (PIM) architectures have demonstrated great potential in accelerating numerous deep learning tasks. Particularly, resistive random-access memory (RRAM) devices provide a promising hardware substrate to build PIM…

Hardware Architecture · Computer Science 2022-02-01 Weidong Cao , Yilong Zhao , Adith Boloor , Yinhe Han , Xuan Zhang , Li Jiang

Large language models (LLMs) have revolutionized AI applications, yet their enormous computational demands severely limit deployment and real-time performance. Quantization methods can help reduce computational costs, however, attaining the…

Machine Learning · Computer Science 2025-09-03 Shaobo Ma , Chao Fang , Haikuo Shao , Zhongfeng Wang

The performance bottleneck of deep-learning-based recommender systems resides in their backbone Deep Neural Networks. By integrating Processing-In-Memory~(PIM) architectures, researchers can reduce data movement and enhance energy…

Hardware Architecture · Computer Science 2025-05-19 Feng Cheng , Tunhou Zhang , Junyao Zhang , Jonathan Hao-Cheng Ku , Yitu Wang , Xiaoxuan Yang , Hai , Li , Yiran Chen

Cryptographic algorithms such as AES-128 and SHA-256 are fundamental to ensuring data security and integrity. Although these algorithms are computationally efficient, their performance is often constrained by the processor-centric…

Cryptography and Security · Computer Science 2026-05-20 Nicola Barcarolo , Brahmaiah Gandham , Mohammad Sadrosadati , Roberto Passerone , Onur Mutlu , Flavio Vella

Sparse tensors are the most used representation of sparse multidimensional data. Operations that decompose them, selecting their most important features while reducing their dimension, have become prevalent procedures in machine learning.…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Daniel Pacheco , Leonel Sousa , Aleksandar Ilic

The performance gap between memory and processor has grown rapidly. Consequently, the energy and wall-clock time costs associated with moving data between the CPU and main memory predominate the overall computational cost. The…

Hardware Architecture · Computer Science 2024-03-01 Qingcai Jiang , Shaojie Tan , Junshi Chen , Hong An

Large Language Models (LLMs) have become essential in a variety of applications due to their advanced language understanding and generation capabilities. However, their computational and memory requirements pose significant challenges to…

Hardware Architecture · Computer Science 2024-12-02 Cristobal Ortega , Yann Falevoz , Renaud Ayrignac

In-DRAM Processing-In-Memory (DRAM-PIM) has emerged as a promising approach to accelerate memory-intensive workloads by mitigating data transfer overhead between DRAM and the host processor. Bit-serial DRAM-PIM architectures, further…

Hardware Architecture · Computer Science 2025-12-11 Siyuan Ma , Jiajun Hu , Jeeho Ryoo , Aman Arora , Lizy Kurian John

There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new…

Processing In Memory (PIM) accelerators are promising architecture that can provide massive parallelization and high efficiency in various applications. Such architectures can instantaneously provide ultra-fast operation over extensive…

Hardware Architecture · Computer Science 2022-07-26 Kazi Abu Zubair , Sumit Kumar Jha , David Mohaisen , Clayton Hughes , Amro Awad

In-memory database query processing frequently involves substantial data transfers between the CPU and memory, leading to inefficiencies due to Von Neumann bottleneck. Processing-in-Memory (PIM) architectures offer a viable solution to…

The resurgence of near-memory processing (NMP) with the advent of big data has shifted the computation paradigm from processor-centric to memory-centric computing. To meet the bandwidth and capacity demands of memory-centric computing, 3D…

Hardware Architecture · Computer Science 2021-04-29 Pritam Majumder , Jiayi Huang , Sungkeun Kim , Abdullah Muzahid , Dylan Siegers , Chia-Che Tsai , Eun Jung Kim
‹ Prev 1 2 3 10 Next ›