Related papers: Reducing Load Latency with Cache Level Prediction

Data Cache Prefetching with Perceptron Learning

Cache prefetcher greatly eliminates compulsory cache misses, by fetching data from slower memory to faster cache before it is actually required by processors. Sophisticated prefetchers predict next use cache line by repeating program's…

Hardware Architecture · Computer Science 2017-12-05 Haoyuan Wang , Zhiwei Luo

Prefetching in Deep Memory Hierarchies with NVRAM as Main Memory

Emerging applications, such as big data analytics and machine learning, require increasingly large amounts of main memory, often exceeding the capacity of current commodity processors built on DRAM technology. To address this, recent…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-27 Manel Lurbe , Miguel Avargues , Salvador Petit , Maria E. Gomez , Rui Yang , Guanhao Wang , Julio Sahuquillo

A Survey of Novel Cache Hierarchy Designs for High Workloads

Traditional on-die, three-level cache hierarchy design is very commonly used but is also prone to latency, especially at the Level 2 (L2) cache. We discuss three distinct ways of improving this design in order to have better performance.…

Hardware Architecture · Computer Science 2021-01-26 Pranjal Singh Rajput , Sonnya Dellarosa , Kanya Satis

Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load Prediction

Long-latency load requests continue to limit the performance of high-performance processors. To increase the latency tolerance of a processor, architects have primarily relied on two key techniques: sophisticated data prefetchers and large…

Hardware Architecture · Computer Science 2022-10-03 Rahul Bera , Konstantinos Kanellopoulos , Shankar Balachandran , David Novo , Ataberk Olgun , Mohammad Sadrosadati , Onur Mutlu

A Survey on Recent Hardware Data Prefetching Approaches with An Emphasis on Servers

Data prefetching, i.e., the act of predicting application's future memory accesses and fetching those that are not in the on-chip caches, is a well-known and widely-used approach to hide the long latency of memory accesses. The fruitfulness…

Hardware Architecture · Computer Science 2020-09-03 Mohammad Bakhshalipour , Mehran Shakerinava , Fatemeh Golshan , Ali Ansari , Pejman Lotfi-Karman , Hamid Sarbazi-Azad

Pickle Prefetcher: Programmable and Scalable Last-Level Cache Prefetcher

Modern high-performance architectures employ large last-level caches (LLCs). While large LLCs can reduce average memory access latency for workloads with a high degree of locality, they can also increase latency for workloads with irregular…

Hardware Architecture · Computer Science 2025-11-26 Hoa Nguyen , Pongstorn Maidee , Jason Lowe-Power , Alireza Kaviani

Prefetcher-based DRAM Architecture

Advancement in Processor technology has made it easy to handle data-intensive workloads, but limiting main memory advances has created performance bottlenecks. In DRAM, there have been improvements in DRAM access latency as well as…

Hardware Architecture · Computer Science 2021-05-24 Saurabh Jaiswal , Shailendra Kumar Gupta , Soumya Soubhagya Dandapat

Modeling and Optimizing Latency for Delayed Hit Caching with Stochastic Miss Latency

Caching is crucial for system performance, but the delayed hit phenomenon, where requests queue during lengthy fetches after a cache miss, significantly degrades user-perceived latency in modern high-throughput systems. While prior works…

Networking and Internet Architecture · Computer Science 2025-05-22 Bowen Jiang , Chaofan Ma

Pointer-Chase Prefetcher for Linked Data Structures

Caches only exploit spatial and temporal locality in a set of address referenced in a program. Due to dynamic construction of linked data-structures, they are difficult to cache as the spatial locality between the nodes is highly dependent…

Hardware Architecture · Computer Science 2018-01-25 Nitish Kumar Srivastava , Akshay Dilip Navalakha

A Dynamic Web Page Prediction Model Based on Access Patterns to Offer Better User Latency

The growth of the World Wide Web has emphasized the need for improvement in user latency. One of the techniques that are used for improving user latency is Caching and another is Web Prefetching. Approaches that bank solely on caching offer…

Networking and Internet Architecture · Computer Science 2011-02-04 Debajyoti Mukhopadhyay , Priyanka Mishra , Dwaipayan Saha , Young-Chon Kim

Lightweight ML-based Runtime Prefetcher Selection on Many-core Platforms

Modern computer designs support composite prefetching, where multiple individual prefetcher components are used to target different memory access patterns. However, multiple prefetchers competing for resources can drastically hurt…

Hardware Architecture · Computer Science 2023-07-18 Erika S. Alcorta , Mahesh Madhav , Scott Tetrick , Neeraja J. Yadwadkar , Andreas Gerstlauer

Integrating Prefetcher Selection with Dynamic Request Allocation Improves Prefetching Efficiency

Hardware prefetching plays a critical role in hiding the off-chip DRAM latency. The complexity of applications results in a wide variety of memory access patterns, prompting the development of numerous cache-prefetching algorithms.…

Hardware Architecture · Computer Science 2025-03-26 Mengming Li , Qijun Zhang , Yongqing Ren , Zhiyao Xie

A Memory Hierarchical Layer Assigning and Prefetching Technique to Overcome the Memory Performance/Energy Bottleneck

The memory subsystem has always been a bottleneck in performance as well as significant power contributor in memory intensive applications. Many researchers have presented multi-layered memory hierarchies as a means to design energy and…

Hardware Architecture · Computer Science 2011-11-09 Minas Dasygenis , Erik Brockmeyer , Bart Durinck , Francky Catthoor , Dimitrios Soudris , Antonios Thanailakis

Cache Where you Want! Reconciling Predictability and Coherent Caching

Real-time and cyber-physical systems need to interact with and respond to their physical environment in a predictable time. While multicore platforms provide incredible computational power and throughput, they also introduce new sources of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-29 Ayoosh Bansal , Jayati Singh , Yifan Hao , Jen-Yang Wen , Renato Mancuso , Marco Caccamo

TransforMAP: Transformer for Memory Access Prediction

Data Prefetching is a technique that can hide memory latency by fetching data before it is needed by a program. Prefetching relies on accurate memory access prediction, to which task machine learning based methods are increasingly applied.…

Hardware Architecture · Computer Science 2022-05-31 Pengmiao Zhang , Ajitesh Srivastava , Anant V. Nori , Rajgopal Kannan , Viktor K. Prasanna

PREFENDER: A Prefetching Defender against Cache Side Channel Attacks as A Pretender

Cache side channel attacks are increasingly alarming in modern processors due to the recent emergence of Spectre and Meltdown attacks. A typical attack performs intentional cache access and manipulates cache states to leak secrets by…

Hardware Architecture · Computer Science 2024-05-16 Luyi Li , Jiayi Huang , Lang Feng , Zhongfeng Wang

Helper Without Threads: Customized Prefetching for Delinquent Irregular Loads

The growing memory footprints of cloud and big data applications mean that data center CPUs can spend significant time waiting for memory. An attractive approach to improving performance in such centralized compute settings is to employ…

Hardware Architecture · Computer Science 2020-09-02 Karthik Sankaranarayanan , Chit-Kwan Lin , Gautham Chinya

Improved Prefetching Techniques for Linked Data Structures

With ever-increasing main memory stall times, we need novel techniques to reduce effective memory access latencies. Prefetching has been shown to be an effective solution, especially with contiguous data structures that follow the…

Hardware Architecture · Computer Science 2025-05-29 Nikola Vuk Maruszewski

PUL: Pre-load in Software for Caches Wouldn't Always Play Along

Memory latencies and bandwidth are major factors, limiting system performance and scalability. Modern CPUs aim at hiding latencies by employing large caches, out-of-order execution, or complex hardware prefetchers. However, software-based…

Databases · Computer Science 2025-06-23 Arthur Bernhardt , Sajjad Tamimi , Florian Stock , Andreas Koch , Ilia Petrov

On Latency Predictors for Neural Architecture Search

Efficient deployment of neural networks (NN) requires the co-optimization of accuracy and latency. For example, hardware-aware neural architecture search has been used to automatically find NN architectures that satisfy a latency constraint…

Machine Learning · Computer Science 2024-03-06 Yash Akhauri , Mohamed S. Abdelfattah