Related papers: Fine-Grained Address Segmentation for Attention-Ba…

TransforMAP: Transformer for Memory Access Prediction

Data Prefetching is a technique that can hide memory latency by fetching data before it is needed by a program. Prefetching relies on accurate memory access prediction, to which task machine learning based methods are increasingly applied.…

Hardware Architecture · Computer Science 2022-05-31 Pengmiao Zhang , Ajitesh Srivastava , Anant V. Nori , Rajgopal Kannan , Viktor K. Prasanna

Deep Learning based Data Prefetching in CPU-GPU Unified Virtual Memory

Unified Virtual Memory (UVM) relieves the developers from the onus of maintaining complex data structures and explicit data migration by enabling on-demand data movement between CPU memory and GPU memory. However, on-demand paging soon…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-11 Xinjian Long , Xiangyang Gong , Huiyang Zhou

A neural network memory prefetcher using semantic locality

Accurate memory prefetching is paramount for processor performance, and modern processors employ various techniques to identify and prefetch different memory access patterns. While most modern prefetchers target spatio-temporal patterns by…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-14 Leeor Peled , Uri Weiser , Yoav Etsion

An Approach to Data Prefetching Using 2-Dimensional Selection Criteria

We propose an approach to data memory prefetching which augments the standard prefetch buffer with selection criteria based on performance and usage pattern of a given instruction. This approach is built on top of a pattern matching based…

Hardware Architecture · Computer Science 2015-05-18 Jean Sung , Sebastian Krupa , Andrew Fishberg , Josef Spjut

Data Cache Prefetching with Perceptron Learning

Cache prefetcher greatly eliminates compulsory cache misses, by fetching data from slower memory to faster cache before it is actually required by processors. Sophisticated prefetchers predict next use cache line by repeating program's…

Hardware Architecture · Computer Science 2017-12-05 Haoyuan Wang , Zhiwei Luo

GrASP: A Generalizable Address-based Semantic Prefetcher for Scalable Transactional and Analytical Workloads

Data prefetching--loading data into the cache before it is requested--is essential for reducing I/O overhead and improving database performance. While traditional prefetchers focus on sequential patterns, recent learning-based approaches,…

Databases · Computer Science 2025-10-14 Farzaneh Zirak , Farhana Choudhury , Renata Borovica-Gajic

Reducing Load Latency with Cache Level Prediction

High load latency that results from deep cache hierarchies and relatively slow main memory is an important limiter of single-thread performance. Data prefetch helps reduce this latency by fetching data up the hierarchy before it is…

Hardware Architecture · Computer Science 2021-03-30 Majid Jalili , Mattan Erez

PrefixMemory-Tuning: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention

Parameter-Efficient Fine-Tuning (PEFT) methods have become crucial for rapidly adapting large language models (LLMs) to downstream tasks. Prefix-Tuning, an early and effective PEFT technique, demonstrated the ability to achieve performance…

Computation and Language · Computer Science 2026-04-21 Haonan Wang , Brian Chen , Siquan Li , Xinhe Liang , Hwee Kuan Lee , Kenji Kawaguchi , Tianyang Hu

Semantic prefetching using forecast slices

Modern prefetchers identify memory access patterns in order to predict future accesses. However, many applications exhibit irregular access patterns that do not manifest spatio-temporal locality in the memory address space. Such…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-14 Leeor Peled , Uri Weiser , Yoav Etsion

Accelerating Graph Analytics on a Reconfigurable Architecture with a Data-Indirect Prefetcher

The irregular nature of memory accesses of graph workloads makes their performance poor on modern computing platforms. On manycore reconfigurable architectures (MRAs), in particular, even state-of-the-art graph prefetchers do not work well…

Hardware Architecture · Computer Science 2023-01-31 Yichen Yang , Jingtao Li , Nishil Talati , Subhankar Pal , Siying Feng , Chaitali Chakrabarti , Trevor Mudge , Ronald Dreslinski

Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching

Attention-based Neural Networks (NN) have demonstrated their effectiveness in accurate memory access prediction, an essential step in data prefetching. However, the substantial computational overheads associated with these models result in…

Neural and Evolutionary Computing · Computer Science 2024-02-23 Pengmiao Zhang , Neelesh Gupta , Rajgopal Kannan , Viktor K. Prasanna

A Survey on Recent Hardware Data Prefetching Approaches with An Emphasis on Servers

Data prefetching, i.e., the act of predicting application's future memory accesses and fetching those that are not in the on-chip caches, is a well-known and widely-used approach to hide the long latency of memory accesses. The fruitfulness…

Hardware Architecture · Computer Science 2020-09-03 Mohammad Bakhshalipour , Mehran Shakerinava , Fatemeh Golshan , Ali Ansari , Pejman Lotfi-Karman , Hamid Sarbazi-Azad

Prefetching in Deep Memory Hierarchies with NVRAM as Main Memory

Emerging applications, such as big data analytics and machine learning, require increasingly large amounts of main memory, often exceeding the capacity of current commodity processors built on DRAM technology. To address this, recent…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-27 Manel Lurbe , Miguel Avargues , Salvador Petit , Maria E. Gomez , Rui Yang , Guanhao Wang , Julio Sahuquillo

Towards Adaptive Prefix Tuning for Parameter-Efficient Language Model Fine-tuning

Fine-tuning large pre-trained language models on various downstream tasks with whole parameters is prohibitively expensive. Hence, Parameter-efficient fine-tuning has attracted attention that only optimizes a few task-specific parameters…

Computation and Language · Computer Science 2023-05-25 Zhen-Ru Zhang , Chuanqi Tan , Haiyang Xu , Chengyu Wang , Jun Huang , Songfang Huang

FineText: Text Classification via Attention-based Language Model Fine-tuning

Training deep neural networks from scratch on natural language processing (NLP) tasks requires significant amount of manually labeled text corpus and substantial time to converge, which usually cannot be satisfied by the customers. In this…

Computation and Language · Computer Science 2019-10-29 Yunzhe Tao , Saurabh Gupta , Satyapriya Krishna , Xiong Zhou , Orchid Majumder , Vineet Khare

Phases, Modalities, Temporal and Spatial Locality: Domain Specific ML Prefetcher for Accelerating Graph Analytics

Memory performance is a bottleneck in graph analytics acceleration. Existing Machine Learning (ML) prefetchers struggle with phase transitions and irregular memory accesses in graph processing. We propose MPGraph, an ML-based Prefetcher for…

Machine Learning · Computer Science 2023-09-26 Pengmiao Zhang , Rajgopal Kannan , Viktor K. Prasanna

Prefix Propagation: Parameter-Efficient Tuning for Long Sequences

Parameter-efficient tuning aims to mitigate the large memory requirements of adapting pretrained language models for downstream tasks. For example, one popular method, prefix-tuning, prepends trainable tokens to sequences while freezing the…

Computation and Language · Computer Science 2023-05-26 Jonathan Li , Will Aitken , Rohan Bhambhoria , Xiaodan Zhu

Clustering by Attention: Leveraging Prior Fitted Transformers for Data Partitioning

Clustering is a core task in machine learning with wide-ranging applications in data mining and pattern recognition. However, its unsupervised nature makes it inherently challenging. Many existing clustering algorithms suffer from critical…

Machine Learning · Computer Science 2025-07-29 Ahmed Shokry , Ayman Khalafallah

AMC: Access to Miss Correlation Prefetcher for Evolving Graph Analytics

Modern memory hierarchies work well with applications that have good spatial locality. Evolving (dynamic) graphs are important applications widely used to model graphs and networks with edge and vertex changes. They exhibit irregular memory…

Hardware Architecture · Computer Science 2024-06-21 Abhishek Singh , Christian Schulte , Xiaochen Guo

FANet: A Feedback Attention Network for Improved Biomedical Image Segmentation

The increase of available large clinical and experimental datasets has contributed to a substantial amount of important contributions in the area of biomedical image analysis. Image segmentation, which is crucial for any quantitative…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Nikhil Kumar Tomar , Debesh Jha , Michael A. Riegler , Håvard D. Johansen , Dag Johansen , Jens Rittscher , Pål Halvorsen , Sharib Ali