Related papers: On Memory Codelets: Prefetching, Recoding, Moving …

Learning Memory Access Patterns

The explosion in workload complexity and the recent slow-down in Moore's law scaling call for new approaches towards efficient computing. Researchers are now beginning to use recent advances in machine learning in software optimizations,…

Machine Learning · Computer Science 2018-03-16 Milad Hashemi , Kevin Swersky , Jamie A. Smith , Grant Ayers , Heiner Litz , Jichuan Chang , Christos Kozyrakis , Parthasarathy Ranganathan

Chiplets and the Codelet Model

Recently, hardware technology has rapidly evolved pertaining to domain-specific applications/architectures. Soon, processors may be composed of a large collection of vendor-independent IP specialized for application-specific algorithms,…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-14 Dawson Fox , Jose M Monsalve Diaz , Xiaoming Li

Pointer-Chase Prefetcher for Linked Data Structures

Caches only exploit spatial and temporal locality in a set of address referenced in a program. Due to dynamic construction of linked data-structures, they are difficult to cache as the spatial locality between the nodes is highly dependent…

Hardware Architecture · Computer Science 2018-01-25 Nitish Kumar Srivastava , Akshay Dilip Navalakha

A Memory Hierarchical Layer Assigning and Prefetching Technique to Overcome the Memory Performance/Energy Bottleneck

The memory subsystem has always been a bottleneck in performance as well as significant power contributor in memory intensive applications. Many researchers have presented multi-layered memory hierarchies as a means to design energy and…

Hardware Architecture · Computer Science 2011-11-09 Minas Dasygenis , Erik Brockmeyer , Bart Durinck , Francky Catthoor , Dimitrios Soudris , Antonios Thanailakis

Pickle Prefetcher: Programmable and Scalable Last-Level Cache Prefetcher

Modern high-performance architectures employ large last-level caches (LLCs). While large LLCs can reduce average memory access latency for workloads with a high degree of locality, they can also increase latency for workloads with irregular…

Hardware Architecture · Computer Science 2025-11-26 Hoa Nguyen , Pongstorn Maidee , Jason Lowe-Power , Alireza Kaviani

A neural network memory prefetcher using semantic locality

Accurate memory prefetching is paramount for processor performance, and modern processors employ various techniques to identify and prefetch different memory access patterns. While most modern prefetchers target spatio-temporal patterns by…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-14 Leeor Peled , Uri Weiser , Yoav Etsion

Exploring DRAM Cache Prefetching for Pooled Memory

Hardware based memory pooling enabled by interconnect standards like CXL have been gaining popularity amongst cloud providers and system integrators. While pooling memory resources has cost benefits, it comes at a penalty of increased…

Hardware Architecture · Computer Science 2024-06-24 Chandrahas Tirumalasetty , Narasimha Annapreddy

A Survey on Recent Hardware Data Prefetching Approaches with An Emphasis on Servers

Data prefetching, i.e., the act of predicting application's future memory accesses and fetching those that are not in the on-chip caches, is a well-known and widely-used approach to hide the long latency of memory accesses. The fruitfulness…

Hardware Architecture · Computer Science 2020-09-03 Mohammad Bakhshalipour , Mehran Shakerinava , Fatemeh Golshan , Ali Ansari , Pejman Lotfi-Karman , Hamid Sarbazi-Azad

PUL: Pre-load in Software for Caches Wouldn't Always Play Along

Memory latencies and bandwidth are major factors, limiting system performance and scalability. Modern CPUs aim at hiding latencies by employing large caches, out-of-order execution, or complex hardware prefetchers. However, software-based…

Databases · Computer Science 2025-06-23 Arthur Bernhardt , Sajjad Tamimi , Florian Stock , Andreas Koch , Ilia Petrov

Fundamental Limits of Caching: Improved Bounds with Coded Prefetching

We consider a cache network in which a single server is connected to multiple users via a shared error free link. The server has access to a database with $N$ files of equal length $F$, and serves $K$ users each with a cache memory of $MF$…

Information Theory · Computer Science 2017-05-24 Jesús Gómez-Vilardebó

Data Cache Prefetching with Perceptron Learning

Cache prefetcher greatly eliminates compulsory cache misses, by fetching data from slower memory to faster cache before it is actually required by processors. Sophisticated prefetchers predict next use cache line by repeating program's…

Hardware Architecture · Computer Science 2017-12-05 Haoyuan Wang , Zhiwei Luo

Semantic prefetching using forecast slices

Modern prefetchers identify memory access patterns in order to predict future accesses. However, many applications exhibit irregular access patterns that do not manifest spatio-temporal locality in the memory address space. Such…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-14 Leeor Peled , Uri Weiser , Yoav Etsion

Fine-Grained Address Segmentation for Attention-Based Variable-Degree Prefetching

Machine learning algorithms have shown potential to improve prefetching performance by accurately predicting future memory accesses. Existing approaches are based on the modeling of text prediction, considering prefetching as a…

Hardware Architecture · Computer Science 2022-05-06 Pengmiao Zhang , Ajitesh Srivastava , Anant V. Nori , Rajgopal Kannan , Viktor K. Prasanna

Coding for Improved Throughput Performance in Network Switches

Network switches and routers need to serve packet writes and reads at rates that challenge the most advanced memory technologies. As a result, scaling the switching rates is commonly done by parallelizing the packet I/Os using multiple…

Networking and Internet Architecture · Computer Science 2016-05-17 Rami Cohen , Yuval Cassuto

Prefetching in Deep Memory Hierarchies with NVRAM as Main Memory

Emerging applications, such as big data analytics and machine learning, require increasingly large amounts of main memory, often exceeding the capacity of current commodity processors built on DRAM technology. To address this, recent…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-27 Manel Lurbe , Miguel Avargues , Salvador Petit , Maria E. Gomez , Rui Yang , Guanhao Wang , Julio Sahuquillo

PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM Serving

Large language models (LLMs) are typically served from clusters of GPUs/NPUs that consist of large number of devices. Unfortunately, communication between these devices incurs significant overhead, increasing the inference latency and cost…

Artificial Intelligence · Computer Science 2025-05-27 Ahmet Caner Yüzügüler , Jiawei Zhuang , Lukas Cavigelli

Deep Learning based Data Prefetching in CPU-GPU Unified Virtual Memory

Unified Virtual Memory (UVM) relieves the developers from the onus of maintaining complex data structures and explicit data migration by enabling on-demand data movement between CPU memory and GPU memory. However, on-demand paging soon…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-11 Xinjian Long , Xiangyang Gong , Huiyang Zhou

Memory-Centric Computing: Solving Computing's Memory Problem

Computing has a huge memory problem. The memory system, consisting of multiple technologies at different levels, is responsible for most of the energy consumption, performance bottlenecks, robustness problems, monetary cost, and hardware…

Hardware Architecture · Computer Science 2025-09-05 Onur Mutlu , Ataberk Olgun , Ismail Emir Yuksel

Reducing Peak Memory Usage for Modern Multimodal Large Language Model Pipelines

Multimodal large language models (MLLMs) have recently demonstrated strong capabilities in understanding and generating responses from diverse visual inputs, including high-resolution images and long video sequences. As these models scale…

Computer Vision and Pattern Recognition · Computer Science 2026-04-21 Junwan Kim , Hyunkyung Bae

TransforMAP: Transformer for Memory Access Prediction

Data Prefetching is a technique that can hide memory latency by fetching data before it is needed by a program. Prefetching relies on accurate memory access prediction, to which task machine learning based methods are increasingly applied.…

Hardware Architecture · Computer Science 2022-05-31 Pengmiao Zhang , Ajitesh Srivastava , Anant V. Nori , Rajgopal Kannan , Viktor K. Prasanna