English
Related papers

Related papers: Asynchronous Memory Access Unit: Exploiting Massiv…

200 papers

In future data centers, applications will make heavy use of far memory (including disaggregated memory pools and NVM). The access latency of far memory is more widely distributed than that of local memory accesses. This makes the efficiency…

Hardware Architecture · Computer Science 2021-12-28 Luming Wang , Xu Zhang , Tianyue Lu , Mingyu Chen

The main memory access latency has not much improved for more than two decades while the CPU performance had been exponentially increasing until recently. Approximate memory is a technique to reduce the DRAM access latency in return of…

Emerging Technologies · Computer Science 2021-01-27 Soramichi Akiyama , Ryota Shioya

Phase-change memory (PCM) devices have multiple banks to serve memory requests in parallel. Unfortunately, if two requests go to the same bank, they have to be served one after another, leading to lower system performance. We observe that a…

Hardware Architecture · Computer Science 2019-08-22 Shihao Song , Anup Das , Onur Mutlu , Nagarajan Kandasamy

Over the past two decades, the storage capacity and access bandwidth of main memory have improved tremendously, by 128x and 20x, respectively. These improvements are mainly due to the continuous technology scaling of DRAM (dynamic…

Hardware Architecture · Computer Science 2017-12-25 Kevin K. Chang

Remote memory access (RMA) is an emerging high-performance programming model that uses RDMA hardware directly. Yet, accessing remote memories cannot invoke activities at the target which complicates implementation and limits performance of…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-20 Maciej Besta , Torsten Hoefler

Modern out-of-order processors have increased capacity to exploit instruction level parallelism (ILP) and memory level parallelism (MLP), e.g., by using wide superscalar pipelines and vector execution units, as well as deep buffers for…

Programming Languages · Computer Science 2018-07-05 Vladimir Kiriansky , Haoran Xu , Martin Rinard , Saman Amarasinghe

The widespread adoption of Large Language Models (LLMs) has exponentially increased the demand for efficient serving systems. With growing requests and context lengths, key-value (KV)-related operations, including attention computation and…

Hardware Architecture · Computer Science 2026-02-13 Lian Liu , Shixin Zhao , Yutian Zhou , Yintao He , Mengdi Wang , Yinhe Han , Ying Wang

Memory load/store instructions consume an important part in execution time and energy consumption in domain-specific accelerators. For designing highly parallel systems, available parallelism at each granularity is extracted from the…

Hardware Architecture · Computer Science 2022-09-07 Khushal Sethi

Sequential computation is well understood but does not scale well with current technology. Within the next decade, systems will contain large numbers of processors with potentially thousands of processors per chip. Despite this, many…

Hardware Architecture · Computer Science 2015-11-17 James Hanlon

Modern data-intensive applications face memory latency challenges exacerbated by disaggregated memory systems. Recent work shows that coroutines are promising in effectively interleaving tasks and hiding memory latency, but they struggle to…

Hardware Architecture · Computer Science 2025-11-20 Zhuolun Jiang , Songyue Wang , Xiaokun Pei , Tianyue Lu , Mingyu Chen

Parallel processing is considered as todays and future trend for improving performance of computers. Computing devices ranging from small embedded systems to big clusters of computers rely on parallelizing applications to reduce execution…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-11-27 Oussama Tahan

Sequence alignment is a memory bound computation whose performance in modern systems is limited by the memory bandwidth bottleneck. Processing-in-memory architectures alleviate this bottleneck by providing the memory with computing…

Hardware Architecture · Computer Science 2023-03-28 Safaa Diab , Amir Nassereldine , Mohammed Alser , Juan Gómez-Luna , Onur Mutlu , Izzat El Hajj

The effective use of parallel computing resources to speed up algorithms in current multi-core parallel architectures remains a difficult challenge, with ease of programming playing a key role in the eventual success of various parallel…

Data Structures and Algorithms · Computer Science 2014-12-09 Arash Farzan , Alejandro López-Ortiz , Patrick K. Nicholson , Alejandro Salinger

The resurgence of near-memory processing (NMP) with the advent of big data has shifted the computation paradigm from processor-centric to memory-centric computing. To meet the bandwidth and capacity demands of memory-centric computing, 3D…

Hardware Architecture · Computer Science 2021-04-29 Pritam Majumder , Jiayi Huang , Sungkeun Kim , Abdullah Muzahid , Dylan Siegers , Chia-Che Tsai , Eun Jung Kim

The rapid evolution of Large Language Model (LLM) agents has necessitated robust memory systems to support cohesive long-term interaction and complex reasoning. Benefiting from the strong capabilities of LLMs, recent research focus has…

Artificial Intelligence · Computer Science 2026-04-16 Weiquan Huang , Zixuan Wang , Hehai Lin , Sudong Wang , Bo Xu , Qian Li , Beier Zhu , Linyi Yang , Chengwei Qin

In-memory computing is an emerging computing paradigm that overcomes the limitations of exiting Von-Neumann computing architectures such as the memory-wall bottleneck. In such paradigm, the computations are performed directly on the data…

Emerging Technologies · Computer Science 2022-04-14 Mohammed E. Fouda , Hasan Erdem Yantir , Ahmed M. Eltawil , Fadi Kurdahi

Sequence alignment is a fundamental process in computational biology which identifies regions of similarity in biological sequences. With the exponential growth in the volume of data in bioinformatics databases, the time, processing power,…

Hardware Architecture · Computer Science 2025-07-31 Nasrin Akbari , Mehdi Modarressi , Alireza Khadem

Remote Memory Access (RMA) is an emerging mechanism for programming high-performance computers and datacenters. However, little work exists on resilience schemes for RMA-based applications and systems. In this paper we analyze fault…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-20 Maciej Besta , Torsten Hoefler

This paper presents implementation details and empirical results for a hybrid message passing and shared memory paralleliziation of the adaptive integral method (AIM). AIM is implemented on a (near) petaflop supercomputing cluster of…

Computational Engineering, Finance, and Science · Computer Science 2010-10-08 Fangzhou Wei , Ali E. Yılmaz

Data movement between the main memory and the processor is a key contributor to execution time and energy consumption in memory-intensive applications. This data movement bottleneck can be alleviated using Processing-in-Memory (PiM). One…

‹ Prev 1 2 3 10 Next ›