Related papers: Cache Replacement Algorithm
Graphics Processing Units (GPUs) were once used solely for graphical computation tasks but with the increase in the use of machine learning applications, the use of GPUs to perform general-purpose computing has increased in the last few…
Memory hierarchy is used to compete the processors speed. Cache memory is the fast memory which is used to conduit the speed difference of memory and processor. The access patterns of Level 1 cache (L1) and Level 2 cache (L2) are different,…
Caching techniques are widely used in the era of cloud computing from applications, such as Web caches to infrastructures, Memcached and memory caches in computer architectures. Prediction of cached data can greatly help improve cache…
LLMs are increasingly used world-wide from daily tasks to agentic systems and data analytics, requiring significant GPU resources. LLM inference systems, however, are slow compared to database systems, and inference performance and…
An efficient caching can be achieved by predicting the popularity of the files accurately. It is well known that the popularity of a file can be nudged by using recommendation, and hence it can be estimated accurately leading to an…
In this work, we propose MUSTACHE, a new page cache replacement algorithm whose logic is learned from observed memory access requests rather than fixed like existing policies. We formulate the page request prediction problem as a…
General Purpose Graphic Processing Unit(GPGPU) is used widely for achieving high performance or high throughput in parallel programming. This capability of GPGPUs is very famous in the new era and mostly used for scientific computing which…
Proactive caching is an effective way to alleviate peak-hour traffic congestion by prefetching popular contents at the wireless network edge. To maximize the caching efficiency requires the knowledge of content popularity profile, which…
The rapid development of multi-core system and increase of data-intensive application in recent years call for larger main memory. Traditional DRAM memory can increase its capacity by reducing the feature size of storage cell. Now further…
Modern processors use cache memory: a memory access that "hits" the cache returns early, while a "miss" takes more time. Given a memory access in a program, cache analysis consists in deciding whether this access is always a hit, always a…
Caching is a technique to reduce peak traffic rates by prefetching popular content into memories at the end users. Conventionally, these memories are used to deliver requested content in part from a locally cached copy rather than through…
Cooperative caching is a technique used in mobile ad hoc networks to improve the efficiency of information access by reducing the access latency and bandwidth usage. Cache replacement policy plays a significant role in response time…
Memory latencies and bandwidth are major factors, limiting system performance and scalability. Modern CPUs aim at hiding latencies by employing large caches, out-of-order execution, or complex hardware prefetchers. However, software-based…
Modeling data sharing in GPU programs is a challenging task because of the massive parallelism and complex data sharing patterns provided by GPU architectures. Better GPU caching efficiency can be achieved through careful task scheduling…
Caching is crucial for enabling high-throughput networks for data intensive applications. Traditional caching technology relies on DRAM, as it can transfer data at a high rate. However, DRAM capacity is subject to contention by most system…
Caching and prefetching techniques are fundamental to modern computing, serving to bridge the growing performance gap between processors and memory. Traditional prefetching strategies are often limited by their reliance on predefined…
Cache prefetcher greatly eliminates compulsory cache misses, by fetching data from slower memory to faster cache before it is actually required by processors. Sophisticated prefetchers predict next use cache line by repeating program's…
Storage resources and caching techniques permeate almost every area of communication networks today. In the near future, caching is set to play an important role in storage-assisted Internet architectures, information-centric networks, and…
Modern computer architectures share physical resources between different programs in order to increase area-, energy-, and cost-efficiency. Unfortunately, sharing often gives rise to side channels that can be exploited for extracting or…
In any caching system, the admission and eviction policies determine which contents are added and removed from a cache when a miss occurs. Usually, these policies are devised so as to mitigate staleness and increase the hit probability.…