Related papers: DLS: Directoryless Shared Last-level Cache

Optimal Placement of Cores, Caches and Memory Controllers in Network On-Chip

Parallel programming is emerging fast and intensive applications need more resources, so there is a huge demand for on-chip multiprocessors. Accessing L1 caches beside the cores are the fastest after registers but the size of private caches…

Performance · Computer Science 2016-09-27 Diman Zad Tootaghaj , Farshid Farhat

Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems

In modern Commercial Off-The-Shelf (COTS) multicore systems, each core can generate many parallel memory requests at a time. The processing of these parallel requests in the DRAM controller greatly affects the memory interference delay…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-07-29 Heechul Yun

Sparse-dLLM: Accelerating Diffusion LLMs with Dynamic Cache Eviction

Diffusion Large Language Models (dLLMs) enable breakthroughs in reasoning and parallel decoding but suffer from prohibitive quadratic computational complexity and memory overhead during inference. Current caching techniques accelerate…

Computation and Language · Computer Science 2025-11-06 Yuerong Song , Xiaoran Liu , Ruixiao Li , Zhigeng Liu , Zengfeng Huang , Qipeng Guo , Ziwei He , Xipeng Qiu

DiFache: Efficient and Scalable Caching on Disaggregated Memory using Decentralized Coherence

The disaggregated memory (DM) architecture offers high resource elasticity at the cost of data access performance. While caching frequently accessed data in compute nodes (CNs) reduces access overhead, it requires costly centralized…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-26 Hanze Zhang , Kaiming Wang , Rong Chen , Xingda Wei , Haibo Chen

Dynamic cache reconfiguration based techniques for improving cache energy efficiency

Modern multicore processors are employing large last-level caches, for example Intel's E7-8800 processor uses 24MB L3 cache. Further, with each CMOS technology generation, leakage energy has been dramatically increasing and hence, leakage…

Hardware Architecture · Computer Science 2013-10-17 Sparsh Mittal

DDS: DPU-optimized Disaggregated Storage [Extended Report]

This extended report presents DDS, a novel disaggregated storage architecture enabled by emerging networking hardware, namely DPUs (Data Processing Units). DPUs can optimize the latency and CPU consumption of disaggregated storage servers.…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-08-29 Qizhen Zhang , Philip Bernstein , Badrish Chandramouli , Jiasheng Hu , Yiming Zheng

Cache Coherence Over Disaggregated Memory

Disaggregating memory from compute offers the opportunity to better utilize stranded memory in cloud data centers. It is important to cache data in the compute nodes and maintain cache coherence across multiple compute nodes. However, the…

Databases · Computer Science 2026-01-14 Ruihong Wang , Jianguo Wang , Walid G. Aref

DCO: Dynamic Cache Orchestration for LLM Accelerators through Predictive Management

The rapid adoption of large language models (LLMs) is pushing AI accelerators toward increasingly powerful and specialized designs. Instead of further complicating software development with deeply hierarchical scratchpad memories (SPMs) and…

Hardware Architecture · Computer Science 2025-12-09 Zhongchun Zhou , Chengtao Lai , Yuhang Gu , Wei Zhang

Distributed Discrete Morse Sandwich: Efficient Computation of Persistence Diagrams for Massive Scalar Data

The persistence diagram, which describes the topological features of a dataset, is a key descriptor in Topological Data Analysis. The "Discrete Morse Sandwich" (DMS) method has been reported to be the most efficient algorithm for computing…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-27 Eve Le Guillou , Pierre Fortin , Julien Tierny

Per-Bank Bandwidth Regulation of Shared Last-Level Cache for Real-Time Systems

Modern commercial-off-the-shelf (COTS) multicore processors have advanced memory hierarchies that enhance memory-level parallelism (MLP), which is crucial for high performance. To support high MLP, shared last-level caches (LLCs) are…

Hardware Architecture · Computer Science 2025-07-23 Connor Sullivan , Alex Manley , Mohammad Alian , Heechul Yun

Practical Detectability for Persistent Lock-Free Data Structures

Persistent memory (PM) is an emerging class of storage technology that combines the benefits of DRAM and SSD. This characteristic inspires research on persistent objects in PM with fine-grained concurrency control. Among such objects,…

Programming Languages · Computer Science 2022-03-16 Kyeongmin Cho , Seungmin Jeon , Jeehoon Kang

Relative Performance of a Multi-level Cache with Last-Level Cache Replacement: An Analytic Review

Current day processors employ multi-level cache hierarchy with one or two levels of private caches and a shared last-level cache (LLC). An efficient cache replacement policy at LLC is essential for reducing the off-chip memory transfer as…

Hardware Architecture · Computer Science 2013-07-25 Bijay Paikaray

A4: Microarchitecture-Aware LLC Management for Datacenter Servers with Emerging I/O Devices

In modern server CPUs, the Last-Level Cache (LLC) serves not only as a victim cache for higher-level private caches but also as a buffer for low-latency DMA transfers between CPU cores and I/O devices through Direct Cache Access (DCA).…

Hardware Architecture · Computer Science 2025-06-16 Haneul Park , Jiaqi Lou , Sangjin Lee , Yifan Yuan , Kyoung Soo Park , Yongseok Son , Ipoom Jeong , Nam Sung Kim

Reuse Distance-based Copy-backs of Clean Cache Lines to Lower-level Caches

Cache plays a critical role in reducing the performance gap between CPU and main memory. A modern multi-core CPU generally employs a multi-level hierarchy of caches, through which the most recently and frequently used data are maintained in…

Hardware Architecture · Computer Science 2021-06-01 Rui Wang , Chundong Wang , Chongnan Ye

A Layered Architecture for Erasure-Coded Consistent Distributed Storage

Motivated by emerging applications to the edge computing paradigm, we introduce a two-layer erasure-coded fault-tolerant distributed storage system offering atomic access for read and write operations. In edge computing, clients interact…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-06-01 Kishori M. Konwar , N. Prakash , Nancy Lynch , Muriel Medard

In-Storage Domain-Specific Acceleration for Serverless Computing

While (1) serverless computing is emerging as a popular form of cloud execution, datacenters are going through major changes: (2) storage dissaggregation in the system infrastructure level and (3) integration of domain-specific accelerators…

Hardware Architecture · Computer Science 2024-03-26 Rohan Mahapatra , Soroush Ghodrati , Byung Hoon Ahn , Sean Kinzer , Shu-ting Wang , Hanyang Xu , Lavanya Karthikeyan , Hardik Sharma , Amir Yazdanbakhsh , Mohammad Alian , Hadi Esmaeilzadeh

Predictable Sharing of Last-level Cache Partitions for Multi-core Safety-critical Systems

Last-level cache (LLC) partitioning is a technique to provide temporal isolation and low worst-case latency (WCL) bounds when cores access the shared LLC in multicore safety-critical systems. A typical approach to cache partitioning…

Hardware Architecture · Computer Science 2022-04-05 Zhuanhao Wu , Hiren Patel

Effective Cache Apportioning for Performance Isolation Under Compiler Guidance

With a growing number of cores in modern high-performance servers, effective sharing of the last level cache (LLC) is more critical than ever. The primary agenda of such systems is to maximize performance by efficiently supporting…

Programming Languages · Computer Science 2022-10-04 Bodhisatwa Chatterjee , Sharjeel Khan , Santosh Pande

FSD-Inference: Fully Serverless Distributed Inference with Scalable Cloud Communication

Serverless computing offers attractive scalability, elasticity and cost-effectiveness. However, constraints on memory, CPU and function runtime have hindered its adoption for data-intensive applications and machine learning (ML) workloads.…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-03-25 Joe Oakley , Hakan Ferhatosmanoglu

CADS: Core-Aware Dynamic Scheduler for Multicore Memory Controllers

Memory controller scheduling is crucial in multicore processors, where DRAM bandwidth is shared. Since increased number of requests from multiple cores of processors becomes a source of bottleneck, scheduling the requests efficiently is…

Hardware Architecture · Computer Science 2019-07-19 Eduardo Olmedo Sanchez , Xian-He Sun