Related papers: DSPatch: Dual Spatial Pattern Prefetcher

SPPAM: Signature Pattern Prediction and Access-Map Prefetcher

The discrepancy between processor speed and memory system performance continues to limit the performance of many workloads. To address the issue, one effective and well studied technique is cache prefetching. Many prefetching designs have…

Hardware Architecture · Computer Science 2026-02-10 Maccoy Merrell , Lei Wang , Stavros Kalafatis , Paul V. Gratz

Design Guidelines for High-Performance SCM Hierarchies

With emerging storage-class memory (SCM) nearing commercialization, there is evidence that it will deliver the much-anticipated high density and access latencies within only a few factors of DRAM. Nevertheless, the latency-sensitive nature…

Hardware Architecture · Computer Science 2019-03-11 Dmitrii Ustiugov , Alexandros Daglis , Javier Picorel , Mark Sutherland , Edouard Bugnion , Babak Falsafi , Dionisios Pnevmatikatos

Prefetching in Deep Memory Hierarchies with NVRAM as Main Memory

Emerging applications, such as big data analytics and machine learning, require increasingly large amounts of main memory, often exceeding the capacity of current commodity processors built on DRAM technology. To address this, recent…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-27 Manel Lurbe , Miguel Avargues , Salvador Petit , Maria E. Gomez , Rui Yang , Guanhao Wang , Julio Sahuquillo

Pickle Prefetcher: Programmable and Scalable Last-Level Cache Prefetcher

Modern high-performance architectures employ large last-level caches (LLCs). While large LLCs can reduce average memory access latency for workloads with a high degree of locality, they can also increase latency for workloads with irregular…

Hardware Architecture · Computer Science 2025-11-26 Hoa Nguyen , Pongstorn Maidee , Jason Lowe-Power , Alireza Kaviani

An Approach to Data Prefetching Using 2-Dimensional Selection Criteria

We propose an approach to data memory prefetching which augments the standard prefetch buffer with selection criteria based on performance and usage pattern of a given instruction. This approach is built on top of a pattern matching based…

Hardware Architecture · Computer Science 2015-05-18 Jean Sung , Sebastian Krupa , Andrew Fishberg , Josef Spjut

Data Cache Prefetching with Perceptron Learning

Cache prefetcher greatly eliminates compulsory cache misses, by fetching data from slower memory to faster cache before it is actually required by processors. Sophisticated prefetchers predict next use cache line by repeating program's…

Hardware Architecture · Computer Science 2017-12-05 Haoyuan Wang , Zhiwei Luo

A Survey on Recent Hardware Data Prefetching Approaches with An Emphasis on Servers

Data prefetching, i.e., the act of predicting application's future memory accesses and fetching those that are not in the on-chip caches, is a well-known and widely-used approach to hide the long latency of memory accesses. The fruitfulness…

Hardware Architecture · Computer Science 2020-09-03 Mohammad Bakhshalipour , Mehran Shakerinava , Fatemeh Golshan , Ali Ansari , Pejman Lotfi-Karman , Hamid Sarbazi-Azad

Reducing Load Latency with Cache Level Prediction

High load latency that results from deep cache hierarchies and relatively slow main memory is an important limiter of single-thread performance. Data prefetch helps reduce this latency by fetching data up the hierarchy before it is…

Hardware Architecture · Computer Science 2021-03-30 Majid Jalili , Mattan Erez

Understanding and Improving the Latency of DRAM-Based Memory Systems

Over the past two decades, the storage capacity and access bandwidth of main memory have improved tremendously, by 128x and 20x, respectively. These improvements are mainly due to the continuous technology scaling of DRAM (dynamic…

Hardware Architecture · Computer Science 2017-12-25 Kevin K. Chang

Banshee: Bandwidth-Efficient DRAM Caching Via Software/Hardware Cooperation

Putting the DRAM on the same package with a processor enables several times higher memory bandwidth than conventional off-package DRAM. Yet, the latency of in-package DRAM is not appreciably lower than that of off-package DRAM. A promising…

Hardware Architecture · Computer Science 2017-04-11 Xiangyao Yu , Christopher J. Hughes , Nadathur Satish , Onur Mutlu , Srinivas Devadas

DCO: Dynamic Cache Orchestration for LLM Accelerators through Predictive Management

The rapid adoption of large language models (LLMs) is pushing AI accelerators toward increasingly powerful and specialized designs. Instead of further complicating software development with deeply hierarchical scratchpad memories (SPMs) and…

Hardware Architecture · Computer Science 2025-12-09 Zhongchun Zhou , Chengtao Lai , Yuhang Gu , Wei Zhang

Exploring DRAM Cache Prefetching for Pooled Memory

Hardware based memory pooling enabled by interconnect standards like CXL have been gaining popularity amongst cloud providers and system integrators. While pooling memory resources has cost benefits, it comes at a penalty of increased…

Hardware Architecture · Computer Science 2024-06-24 Chandrahas Tirumalasetty , Narasimha Annapreddy

Distance based prefetching algorithms for mining of the sporadic requests associations

Modern storage systems intensively utilize data prefetching algorithms while processing sequences of the read requests. Performance of the prefetching algorithm (for instance increase of the cache hit ratio of the cache system - CHR)…

Databases · Computer Science 2024-06-14 Vadim Voevodkin , Andrey Sokolov

At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads

Over the last three decades, innovations in the memory subsystem were primarily targeted at overcoming the data movement bottleneck. In this paper, we focus on a specific market trend in memory technology: 3D-stacked memory and caches. We…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-17 Jens Domke , Emil Vatai , Balazs Gerofi , Yuetsu Kodama , Mohamed Wahib , Artur Podobas , Sparsh Mittal , Miquel Pericàs , Lingqi Zhang , Peng Chen , Aleksandr Drozd , Satoshi Matsuoka

Multilevel Bidirectional Cache Filter

Modern caches are often required to handle a massive amount of data, which exceeds the amount of available memory; thus, hybrid caches, specifically DRAM/SSD combination, become more and more prevalent. In such environments, in addition to…

Operating Systems · Computer Science 2022-06-28 Ohad Eytan , Roy Friedman

Prefetcher-based DRAM Architecture

Advancement in Processor technology has made it easy to handle data-intensive workloads, but limiting main memory advances has created performance bottlenecks. In DRAM, there have been improvements in DRAM access latency as well as…

Hardware Architecture · Computer Science 2021-05-24 Saurabh Jaiswal , Shailendra Kumar Gupta , Soumya Soubhagya Dandapat

Dash: Scalable Hashing on Persistent Memory

Byte-addressable persistent memory (PM) brings hash tables the potential of low latency, cheap persistence and instant recovery. The recent advent of Intel Optane DC Persistent Memory Modules (DCPMM) further accelerates this trend. Many new…

Databases · Computer Science 2020-10-30 Baotong Lu , Xiangpeng Hao , Tianzheng Wang , Eric Lo

Assessing the Use Cases of Persistent Memory in High-Performance Scientific Computing

As the High Performance Computing world moves towards the Exa-Scale era, huge amounts of data should be analyzed, manipulated and stored. In the traditional storage/memory hierarchy, each compute node retains its data objects in its local…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-07 Yehonatan Fridman , Yaniv Snir , Matan Rusanovsky , Kfir Zvi , Harel Levin , Danny Hendler , Hagit Attiya , Gal Oren

DeepStack: Scalable and Accurate Design Space Exploration for Distributed 3D-Stacked AI Accelerators

Advances in hybrid bonding and packaging have driven growing interest in 3D DRAM-stacked accelerators with higher memory bandwidth and capacity. As LLMs scale to hundreds of billions or trillions of parameters, distributed inference across…

Hardware Architecture · Computer Science 2026-04-10 Zhiwen Mo , Guoyu Li , Hao Mark Chen , Yu Cheng , Zhengju Tang , Qianzhou Wang , Lei Wang , Shuang Liang , Lingxiao Ma , Xianqi Zhou , Yuxiao Guo , Wayne Luk , Jilong Xue , Hongxiang Fan

The Future of Memory: Limits and Opportunities

Memory latency, bandwidth, capacity, and energy increasingly limit performance. In this paper, we reconsider proposed system architectures that consist of huge (many-terabyte to petabyte scale) memories shared among large numbers of CPUs.…

Hardware Architecture · Computer Science 2025-09-24 Samuel Dayo , Shuhan Liu , Peijing Li , Philip Levis , Subhasish Mitra , Thierry Tambe , David Tennenhouse , H. -S. Philip Wong