Related papers: LRC: Dependency-Aware Cache Management for Data An…

LERC: Coordinated Cache Management for Data-Parallel Systems

Memory caches are being aggressively used in today's data-parallel frameworks such as Spark, Tez and Storm. By caching input and intermediate data in memory, compute tasks can witness speedup by orders of magnitude. To maximize the chance…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-08-29 Yinghao Yu , Wei Wang , Jun Zhang , Khaled B. Letaief

Comparative Analysis of Distributed Caching Algorithms: Performance Metrics and Implementation Considerations

This paper presents a comprehensive comparison of distributed caching algorithms employed in modern distributed systems. We evaluate various caching strategies including Least Recently Used (LRU), Least Frequently Used (LFU), Adaptive…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-04 Helen Mayer , James Richards

Intermediate Data Caching Optimization for Multi-Stage and Parallel Big Data Frameworks

In the era of big data and cloud computing, large amounts of data are generated from user applications and need to be processed in the datacenter. Data-parallel computing frameworks, such as Apache Spark, are widely used to perform such…

Performance · Computer Science 2018-05-09 Zhengyu Yang , Danlin Jia , Stratis Ioannidis , Ningfang Mi , Bo Sheng

Inferring Causal Relationships to Improve Caching for Clients with Correlated Requests: Applications to VR

Efficient edge caching reduces latency and alleviates backhaul congestion in modern networks. Traditional caching policies, such as Least Recently Used (LRU) and Least Frequently Used (LFU), perform well under specific request patterns. LRU…

Networking and Internet Architecture · Computer Science 2025-12-10 Agrim Bari , Gustavo de Veciana , Yuqi Zhou

RAC: Relation-Aware Cache Replacement for Large Language Models

The scaling of Large Language Model (LLM) services faces significant cost and latency challenges, making effective caching under tight capacity crucial. Existing cache replacement policies, from heuristics to learning-based methods,…

Databases · Computer Science 2026-02-26 Yuchong Wu , Zihuan Xu , Wangze Ni , Peng Cheng , Lei Chen , Xuemin Lin , Heng Tao Shen , Kui Ren

On Resource Pooling and Separation for LRU Caching

Caching systems using the Least Recently Used (LRU) principle have now become ubiquitous. A fundamental question for these systems is whether the cache space should be pooled together or divided to serve multiple flows of data item requests…

Performance · Computer Science 2017-08-08 Jian Tan , Guocong Quan , Kaiyi Ji , Ness Shroff

On the complexity of cache analysis for different replacement policies

Modern processors use cache memory: a memory access that "hits" the cache returns early, while a "miss" takes more time. Given a memory access in a program, cache analysis consists in deciding whether this access is always a hit, always a…

Programming Languages · Computer Science 2019-09-24 David Monniaux , Valentin Touzeau

Addressing Variability in Reuse Prediction for Last-Level Caches

Last-Level Cache (LLC) represents the bulk of a modern CPU processor's transistor budget and is essential for application performance as LLC enables fast access to data in contrast to much slower main memory. However, applications with…

Hardware Architecture · Computer Science 2020-06-16 Priyank Faldu

Generalization of LRU Cache Replacement Policy with Applications to Video Streaming

Caching plays a crucial role in networking systems to reduce the load on the network and is commonly employed by content delivery networks (CDNs) in order to improve performance. One of the commonly used mechanisms, Least Recently Used…

Networking and Internet Architecture · Computer Science 2019-06-25 Eric Friedlander , Vaneet Aggarwal

Tail-Optimized Caching for LLM Inference

Prompt caching is critical for reducing latency and cost in LLM inference: OpenAI and Anthropic report up to 50-90% cost savings through prompt reuse. Despite its widespread success, little is known about what constitutes an optimal prompt…

Systems and Control · Electrical Eng. & Systems 2025-10-20 Wenxin Zhang , Yueying Li , Ciamac C. Moallemi , Tianyi Peng

A Fast Analytical Model of Fully Associative Caches

While the cost of computation is an easy to understand local property, the cost of data movement on cached architectures depends on global state, does not compose, and is hard to predict. As a result, programmers often fail to consider the…

Performance · Computer Science 2020-01-07 Tobias Gysi , Tobias Grosser , Laurin Brandner , Torsten Hoefler

Spatial multi-LRU: Distributed Caching for Wireless Networks with Coverage Overlaps

This article introduces a novel family of decentralised caching policies, applicable to wireless networks with finite storage at the edge-nodes (stations). These policies, that are based on the Least-Recently-Used replacement principle, are…

Networking and Internet Architecture · Computer Science 2016-12-14 Anastasios Giovanidis , Apostolos Avranas

Cache Mechanism for Agent RAG Systems

Recent advances in Large Language Model (LLM)-based agents have been propelled by Retrieval-Augmented Generation (RAG), which grants the models access to vast external knowledge bases. Despite RAG's success in improving agent performance,…

Computation and Language · Computer Science 2025-11-06 Shuhang Lin , Zhencan Peng , Lingyao Li , Xiao Lin , Xi Zhu , Yongfeng Zhang

Analyzing Adaptive Cache Replacement Strategies

Adaptive Replacement Cache (ARC) and CLOCK with Adaptive Replacement (CAR) are state-of-the- art "adaptive" cache replacement algorithms invented to improve on the shortcomings of classical cache replacement policies such as LRU, LFU and…

Data Structures and Algorithms · Computer Science 2017-04-25 Mario E. Consuegra , Wendy A. Martinez , Giri Narasimhan , Raju Rangaswami , Leo Shao , Giuseppe Vietri

Fast and exact analysis for LRU caches

For applications in worst-case execution time analysis and in security, it is desirable to statically classify memory accesses into those that result in cache hits, and those that result in cache misses. Among cache replacement policies,…

Programming Languages · Computer Science 2018-12-21 Claire Maïza , Valentin Touzeau , David Monniaux , Jan Reineke

Cache Persistence Analysis: Finally Exact

Cache persistence analysis is an important part of worst-case execution time (WCET) analysis. It has been extensively studied in the past twenty years. Despite these efforts, all existing persistence analyses are approximative in the sense…

Programming Languages · Computer Science 2025-07-22 Gregory Stock , Sebastian Hahn , Jan Reineke

Relative Performance of a Multi-level Cache with Last-Level Cache Replacement: An Analytic Review

Current day processors employ multi-level cache hierarchy with one or two levels of private caches and a shared last-level cache (LLC). An efficient cache replacement policy at LLC is essential for reducing the off-chip memory transfer as…

Hardware Architecture · Computer Science 2013-07-25 Bijay Paikaray

Reuse Cache for Heterogeneous CPU-GPU Systems

It is generally observed that the fraction of live lines in shared last-level caches (SLLC) is very small for chip multiprocessors (CMPs). This can be tackled using promotion-based replacement policies like re-reference interval prediction…

Hardware Architecture · Computer Science 2021-07-30 Tejas Shah , Bobbi Yogatama , Kyle Roarty , Rami Dahman

Toward Robust and Efficient ML-Based GPU Caching for Modern Inference

In modern GPU inference, cache efficiency remains a major bottleneck, and heuristic policies such as \textsc{LRU} can perform far worse than the offline optimum. Existing learning-based caching systems improve hit rates mainly through…

Machine Learning · Computer Science 2026-04-27 Peng Chen , Jiaji Zhang , Hailiang Zhao , Yirong Zhang , Shenyao Chen , Jiahong Yu , Xueyan Tang , Yixuan Wang , Hao Li , Jianping Zou , Gang Xiong , Kingsum Chow , Shuibing He , Shuiguang Deng

Reducing DRAM Access Latency by Exploiting DRAM Leakage Characteristics and Common Access Patterns

DRAM-based memory is a critical factor that creates a bottleneck on the system performance since the processor speed largely outperforms the DRAM latency. In this thesis, we develop a low-cost mechanism, called ChargeCache, which enables…

Hardware Architecture · Computer Science 2016-09-26 Hasan Hassan