Related papers: Multilevel Bidirectional Cache Filter
We study the decentralized caching scheme in a two-layer network, which includes a sever, multiple helpers, and multiple users. Basically, the proposed caching scheme consists of two phases, i.e, placement phase and delivery phase. In the…
The Bicameral Cache is a cache organization proposal for a vector architecture that segregates data according to their access type, distinguishing scalar from vector references. Its aim is to avoid both types of references from interfering…
Diffusion models have emerged as state-of-the-art in image generation, but their practical deployment is hindered by the significant computational cost of their iterative denoising process. While existing caching techniques can accelerate…
Distributed storage systems typically maintain strong consistency between data nodes and metadata nodes by adopting ordered writes: 1) first installing data; 2) then updating metadata to make data visible.We propose SwitchDelta to…
Traditional on-die, three-level cache hierarchy design is very commonly used but is also prone to latency, especially at the Level 2 (L2) cache. We discuss three distinct ways of improving this design in order to have better performance.…
Fast and scalable metadata management across multiple metadata servers is crucial for distributed file systems to handle numerous files and directories. Client-side caching of frequently accessed metadata can mitigate server loads, but…
We propose a new decentralized coded caching scheme for a two-phase caching network, where the data placed in user caches in the prefetching phase are random portions of a maximal distance separable (MDS) coded version of the original…
Modern storage systems predominantly use flash-based SSDs as a cache layer due to their favorable performance and cost efficiency. However, in tiny-object workloads, existing flash cache designs still suffer from high write amplification.…
High load latency that results from deep cache hierarchies and relatively slow main memory is an important limiter of single-thread performance. Data prefetch helps reduce this latency by fetching data up the hierarchy before it is…
High main memory latency continues to limit performance of modern high-performance out-of-order cores. While DRAM latency has remained nearly the same over many generations, DRAM bandwidth has grown significantly due to higher frequencies,…
In cost-sensitive deployments, RAID arrays may combine SSDs with different performance levels. Such heterogeneity arises when aging SSDs degrade yet remain usable, or when failed drives are replaced with new devices of explicitly better…
Current day processors employ multi-level cache hierarchy with one or two levels of private caches and a shared last-level cache (LLC). An efficient cache replacement policy at LLC is essential for reducing the off-chip memory transfer as…
Diffusion Transformers have recently demonstrated unprecedented generative capabilities for various tasks. The encouraging results, however, come with the cost of slow inference, since each denoising step requires inference on a transformer…
As its price per bit drops, SSD is increasingly becoming the default storage medium for cloud application databases. However, it has not become the preferred storage medium for key-value caches, even though SSD offers more than 10x lower…
The growing demand for efficient cloud storage solutions has led to the widespread adoption of Solid-State Drives (SSDs) for caching in cloud block storage systems. The management of data writes to SSD caches plays a crucial role in…
For large scale distributed storage systems, flash memories are an excellent choice because flash memories consume less power, take lesser floor space for a target throughput and provide faster access to data. In a traditional distributed…
The rapid development of multi-core system and increase of data-intensive application in recent years call for larger main memory. Traditional DRAM memory can increase its capacity by reducing the feature size of storage cell. Now further…
Caching at the network edge has emerged as a viable solution for alleviating the severe capacity crunch in modern content centric wireless networks by leveraging network load-balancing in the form of localized content storage and delivery.…
Application-level caching is a form of caching that has been increasingly adopted to satisfy performance and throughput requirements. The key idea is to store the results of a computation, to improve performance by reusing instead of…
With the rapidly growing demand of graph processing in the real scene, they have to efficiently handle massive concurrent jobs. Although existing work enable to efficiently handle single graph processing job, there are plenty of memory…