Related papers: LERC: Coordinated Cache Management for Data-Parall…

LRC: Dependency-Aware Cache Management for Data Analytics Clusters

Memory caches are being aggressively used in today's data-parallel systems such as Spark, Tez, and Piccolo. However, prevalent systems employ rather simple cache management policies--notably the Least Recently Used (LRU) policy--that are…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-03-27 Yinghao Yu , Wei Wang , Jun Zhang , Khaled Ben Letaief

Don't cry over spilled records: Memory elasticity of data-parallel applications and its application to cluster scheduling

Understanding the performance of data-parallel workloads when resource-constrained has significant practical importance but unfortunately has received only limited attention. This paper identifies, quantifies and demonstrates memory…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-02-15 Calin Iorgulescu , Florin Dinu , Aunn Raza , Wajih Ul Hassan , Willy Zwaenepoel

FLeeC: a Fast Lock-Free Application Cache

When compared to blocking concurrency, non-blocking concurrency can provide higher performance in parallel shared-memory contexts, especially in high contention scenarios. This paper proposes FLeeC, an application-level cache system based…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-17 André J. Costa , Nuno M. Preguiça , João M. Lourenço

Intermediate Data Caching Optimization for Multi-Stage and Parallel Big Data Frameworks

In the era of big data and cloud computing, large amounts of data are generated from user applications and need to be processed in the datacenter. Data-parallel computing frameworks, such as Apache Spark, are widely used to perform such…

Performance · Computer Science 2018-05-09 Zhengyu Yang , Danlin Jia , Stratis Ioannidis , Ningfang Mi , Bo Sheng

Limited Associativity Makes Concurrent Software Caches a Breeze

Software caches optimize the performance of diverse storage systems, databases and other software systems. Existing works on software caches automatically resort to fully associative cache designs. Our work shows that limited associativity…

Hardware Architecture · Computer Science 2021-09-08 Dolev Adas , Gil Einziger , Roy Friedman

RLCache: Automated Cache Management Using Reinforcement Learning

This study investigates the use of reinforcement learning to guide a general purpose cache manager decisions. Cache managers directly impact the overall performance of computer systems. They govern decisions about which objects should be…

Machine Learning · Computer Science 2019-10-01 Sami Alabed

BackCache: Mitigating Contention-Based Cache Timing Attacks by Hiding Cache Line Evictions

Caches are used to reduce the speed differential between the CPU and memory to improve the performance of modern processors. However, attackers can use contention-based cache timing attacks to steal sensitive information from victim…

Cryptography and Security · Computer Science 2024-06-13 Quancheng Wang , Xige Zhang , Han Wang , Yuzhe Gu , Ming Tang

Improving Spark Application Throughput Via Memory Aware Task Co-location: A Mixture of Experts Approach

Data analytic applications built upon big data processing frameworks such as Apache Spark are an important class of applications. Many of these applications are not latency-sensitive and thus can run as batch jobs in data centers. By…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-03 Vicent Sanz Marco , Ben Taylor , Barry Porter , Zheng Wang

HyDRA: Deadline and Reuse-Aware Cacheability for Hardware Accelerators

The system-level cache is a critical resource shared by processor cores and domain-specific accelerators in heterogeneous systems on chips (SoCs). The strict QoS requirements of accelerators, such as deadlines, can lead to severe…

Hardware Architecture · Computer Science 2026-05-21 Ayushi Agarwal , Anannya Mathur , Preeti Ranjan Panda

DCO: Dynamic Cache Orchestration for LLM Accelerators through Predictive Management

The rapid adoption of large language models (LLMs) is pushing AI accelerators toward increasingly powerful and specialized designs. Instead of further complicating software development with deeply hierarchical scratchpad memories (SPMs) and…

Hardware Architecture · Computer Science 2025-12-09 Zhongchun Zhou , Chengtao Lai , Yuhang Gu , Wei Zhang

Improving the Performance and Endurance of Persistent Memory with Loose-Ordering Consistency

Persistent memory provides high-performance data persistence at main memory. Memory writes need to be performed in strict order to satisfy storage consistency requirements and enable correct recovery from system crashes. Unfortunately,…

Hardware Architecture · Computer Science 2017-05-11 Youyou Lu , Jiwu Shu , Long Sun , Onur Mutlu

Data Cache Prefetching with Perceptron Learning

Cache prefetcher greatly eliminates compulsory cache misses, by fetching data from slower memory to faster cache before it is actually required by processors. Sophisticated prefetchers predict next use cache line by repeating program's…

Hardware Architecture · Computer Science 2017-12-05 Haoyuan Wang , Zhiwei Luo

Cache Coherence Over Disaggregated Memory

Disaggregating memory from compute offers the opportunity to better utilize stranded memory in cloud data centers. It is important to cache data in the compute nodes and maintain cache coherence across multiple compute nodes. However, the…

Databases · Computer Science 2026-01-14 Ruihong Wang , Jianguo Wang , Walid G. Aref

Toward Robust and Efficient ML-Based GPU Caching for Modern Inference

In modern GPU inference, cache efficiency remains a major bottleneck, and heuristic policies such as \textsc{LRU} can perform far worse than the offline optimum. Existing learning-based caching systems improve hit rates mainly through…

Machine Learning · Computer Science 2026-04-27 Peng Chen , Jiaji Zhang , Hailiang Zhao , Yirong Zhang , Shenyao Chen , Jiahong Yu , Xueyan Tang , Yixuan Wang , Hao Li , Jianping Zou , Gang Xiong , Kingsum Chow , Shuibing He , Shuiguang Deng

Making Caches Work for Graph Analytics

Modern hardware systems are heavily underutilized when running large-scale graph applications. While many in-memory graph frameworks have made substantial progress in optimizing these applications, we show that it is still possible to…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-15 Yunming Zhang , Vladimir Kiriansky , Charith Mendis , Matei Zaharia , Saman Amarasinghe

Read-Tuned STT-RAM and eDRAM Cache Hierarchies for Throughput and Energy Enhancement

As capacity and complexity of on-chip cache memory hierarchy increases, the service cost to the critical loads from Last Level Cache (LLC), which are frequently repeated, has become a major concern. The processor may stall for a…

Hardware Architecture · Computer Science 2016-08-09 Navid Khoshavi , Xunchao Chen , Jun Wang , Ronald F. DeMara

Reducing End-to-End Latency of Cause-Effect Chains with Shared Cache Analysis

Cause-effect chains, as a widely used modeling method in real-time embedded systems, are extensively applied in various safety-critical domains. End-to-end latency, as a key real-time attribute of cause-effect chains, is crucial in many…

Systems and Control · Electrical Eng. & Systems 2026-01-29 Yixuan Zhu , Yinkang Gao , Bo Zhang , Xiaohang Gong , Binze Jiang , Lei Gong , Wenqi Lou , Teng Wang , Chao Wang , Xi Li , Xuehai Zhou

Comparative Analysis of Distributed Caching Algorithms: Performance Metrics and Implementation Considerations

This paper presents a comprehensive comparison of distributed caching algorithms employed in modern distributed systems. We evaluate various caching strategies including Least Recently Used (LRU), Least Frequently Used (LFU), Adaptive…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-04 Helen Mayer , James Richards

Memory at Your Service: Fast Memory Allocation for Latency-critical Services

Co-location and memory sharing between latency-critical services, such as key-value store and web search, and best-effort batch jobs is an appealing approach to improving memory utilization in multi-tenant datacenter systems. However, we…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-08 Aidi Pi , Junxian Zhao , Shaoqi Wang , Xiaobo Zhou

Relative Performance of a Multi-level Cache with Last-Level Cache Replacement: An Analytic Review

Current day processors employ multi-level cache hierarchy with one or two levels of private caches and a shared last-level cache (LLC). An efficient cache replacement policy at LLC is essential for reducing the off-chip memory transfer as…

Hardware Architecture · Computer Science 2013-07-25 Bijay Paikaray