Related papers: A Unified Framework for Quantitative Cache Analysi…
Modern computer architectures share physical resources between different programs in order to increase area-, energy-, and cost-efficiency. Unfortunately, sharing often gives rise to side channels that can be exploited for extracting or…
Cache persistence analysis is an important part of worst-case execution time (WCET) analysis. It has been extensively studied in the past twenty years. Despite these efforts, all existing persistence analyses are approximative in the sense…
This paper presents a comprehensive comparison of distributed caching algorithms employed in modern distributed systems. We evaluate various caching strategies including Least Recently Used (LRU), Least Frequently Used (LFU), Adaptive…
The balance metric is a simple approach to estimate the performance of bandwidth-limited loop kernels. However, applying the method to in-cache situations and modern multi-core architectures yields unsatisfactory results. This paper…
When compared to blocking concurrency, non-blocking concurrency can provide higher performance in parallel shared-memory contexts, especially in high contention scenarios. This paper proposes FLeeC, an application-level cache system based…
In this work a novel family of decentralised caching policies for wireless networks is introduced, referred to as spatial multi-LRU. These improve cache-hit probability by exploiting multi-coverage. Two variations are proposed, the…
In modern GPU inference, cache efficiency remains a major bottleneck, and heuristic policies such as \textsc{LRU} can perform far worse than the offline optimum. Existing learning-based caching systems improve hit rates mainly through…
FIFO queues are a fundamental data structure used in a wide range of applications. Concurrent FIFO queues allow multiple execution threads to access the queue simultaneously. Maintaining strict FIFO semantics in concurrent queues leads to…
For applications in worst-case execution time analysis and in security, it is desirable to statically classify memory accesses into those that result in cache hits, and those that result in cache misses. Among cache replacement policies,…
Large language models (LLMs) have demonstrated remarkable capabilities across a variety of tasks. One of the main challenges towards the successful deployment of LLMs is memory management, since they typically involve billions of…
On real-time systems running under timing constraints, scheduling can be performed when one is aware of the worst case execution time (WCET) of tasks. Usually, the WCET of a task is unknown and schedulers make use of safe…
Memory caches are being aggressively used in today's data-parallel frameworks such as Spark, Tez and Storm. By caching input and intermediate data in memory, compute tasks can witness speedup by orders of magnitude. To maximize the chance…
Adaptive Replacement Cache (ARC) and CLOCK with Adaptive Replacement (CAR) are state-of-the- art "adaptive" cache replacement algorithms invented to improve on the shortcomings of classical cache replacement policies such as LRU, LFU and…
The size of computer networks, along with their bandwidths, is growing exponentially. To support these large, high-speed networks, it is neccessary to be able to forward packets in a few microseconds. One part of the forwarding operation…
The rapid adoption of large language models (LLMs) is pushing AI accelerators toward increasingly powerful and specialized designs. Instead of further complicating software development with deeply hierarchical scratchpad memories (SPMs) and…
WCET (Worst-Case Execution Time) estimation on multicore architecture is particularly challenging mainly due to the complex accesses over cache shared by multiple cores. Existing analysis identifies possible contentions between parallel…
Modern computer architectures rely on caches to reduce the latency gap between the CPU and main memory. While indispensable for performance, caches pose a serious threat to security because they leak information about memory access patterns…
Efficiency techniques are an integral part of Entity Resolution, since its infancy. In this survey, we organized the bulk of works in the field into Blocking, Filtering and hybrid techniques, facilitating their understanding and use. We…
We describe a model that enables us to analyze the running time of an algorithm in a computer with a memory hierarchy with limited associativity, in terms of various cache parameters. Our model, an extension of Aggarwal and Vitter's I/O…
This article introduces a novel family of decentralised caching policies, applicable to wireless networks with finite storage at the edge-nodes (stations). These policies, that are based on the Least-Recently-Used replacement principle, are…