English

RAC: Relation-Aware Cache Replacement for Large Language Models

Databases 2026-02-26 v1

Abstract

The scaling of Large Language Model (LLM) services faces significant cost and latency challenges, making effective caching under tight capacity crucial. Existing cache replacement policies, from heuristics to learning-based methods, predominantly rely on limited-window statistics such as recency and frequency. We show these signals are not robust for real-world LLM workloads, which exhibit long reuse distances and sparse local recurrence. To address these limitations, we propose Relation-Aware Cache (RAC), an online eviction strategy that leverages semantic relations among requests to guide eviction decisions. RAC synthesizes two relation-aware signals: (1) Topical Prevalence, which aggregates access evidence at the topic level to capture long-horizon reuse; and (2) Structural Importance, which leverages local intra-topic dependency structure to discriminate entries by their future reuse value. Extensive evaluations show that RAC maintains high effectiveness across diverse workloads, consistently surpassing state-of-the-art baselines by 20%--30% in cache hit ratio.

Keywords

Cite

@article{arxiv.2602.21547,
  title  = {RAC: Relation-Aware Cache Replacement for Large Language Models},
  author = {Yuchong Wu and Zihuan Xu and Wangze Ni and Peng Cheng and Lei Chen and Xuemin Lin and Heng Tao Shen and Kui Ren},
  journal= {arXiv preprint arXiv:2602.21547},
  year   = {2026}
}